The exercise we have used in previous years has been in policy gradients. Since policy gradients is not curriculum this year, I suggest you have a look at the Q-learning tutorial.
Lecture:
Exercise: Q-learning
https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
Exercise (Optional): Policy gradients
You may need to install gym: pip install gym