I’ve attached the screenshot for my training. I’ve let it run for over 10 million steps but there is no correlation, which tells me my model isn’t learning anything. Do you know why this is, or what I could do to improve it? Each episode is the same length.
I’d need more information to help debug your issue. I would confirm that your agent is actually able to reach rewarding states and also that the termination conditions work properly e.g. EndEpisode is being called using either keyboard controls or during training.