Hi,
As the title asks, does the ENTIRE experience buffer is cleared when using curriculum learning?
If not, we can get the same state once with reward and once without, that might be a problem, doesnt it?
Thanks
Hi,
As the title asks, does the ENTIRE experience buffer is cleared when using curriculum learning?
If not, we can get the same state once with reward and once without, that might be a problem, doesnt it?
Thanks
No the replay buffer is not cleared in curriculum learning.
Does your reward function changes between the lessons? We recommend NOT to use SAC in this situation, as changing the reward function would effectively make a new experience distribution. While it’s better to not have drastic changes in environment, PPO should respond better in those situations.