Hello everyone,
this is a question to those users, who’ve managed to build up intuition about reading tensorboard-graphs.
Or maybe those that have collected data from enough different runs that they can look back at their results for different parameters.
How are learning-rate, epsilon and policy loss related?
I understand that “learning-rate” dictates how much the policy changes.
“Epsilon” will cap that change, so it doesn’t change more than epsilon allows.
And I though that “policy loss” in tensorboard will give an indication on how much the policy changed.
However, with the settings I tried (increase learning rate from 0.0003 to 0.003, and increase epsilon from 0.2 to 0.4) the “policy loss” will always stay at an average of 6E-3. It will oscillate a bit, and my 20M steps are probably not enogh to get the policy loss to drop significantly for my environment. But I would expect the “policy loss” to start/stay at a higher level with higher values for “learning rate” and “epsilon”.
So, at what point in my thoughts am I wrong?