BASIC DISCRETE MODEL NOT WORKING

Sorry if this is too basic, I’m a begginer at ML agents.
I built a basic model with 1 discrete action with size of two. These two values (0,1) can get positive rewards depending on the environment values. The issue is the training uses more 0s and learns that 0 is the correct value to use only and doesn’t use 1 much anymore even though there are still some rewards when using 1. Basically the model learns but just from one action. It’s observing the current action (1), and the environment values (2). I’ve tried:

  • playing with the hyperparameters but no success, especially beta, batch and buffer size, normalize and learning rate
  • ppo and sac
  • Addrewards +1 and -0.1 seem to work better but still nothing

Any help would be appreciate it. Thanks!


if it’s getting penalised for taking more actions / or more time it is likely just finding the way to get the most reward in the shortest time.
hard to tell what the issue is without seeing the full picture but
if you want it to explore more actions you would want to reward such behaviour - i.e. add a reward each time a unique action is made
you can also mask individual actions so it can’t be chosen - discrete action mask - this can help block action 0 from being chosen after it has been chosen too often etc

I’ll try that but I actually simplified the code just to prove it and it’s still not working properly. See attached.


8638254--1161684--Screen Shot 2022-12-05 at 5.42.44 PM.png