Policy Gradient Reinforcement Learning in PyTorch

I think one of the best ways to learn a new topic is to explain it as simply as possible so that someone with no experience can understand it (aka The Feynman Technique). This post is an attempt to do that with policy gradient reinforcement learning.

I’m new to reinforcement learning so if I made a mistake or you have a question, let me know, so I can correct the article or try and provide a better explanation.