ML-Agents Issue

Hello.

I have just started to use ML-Agents and I went through the penguin example and it all worked fine. I have started to work on my own game now (just to see if I can do all of it myself) (I’m new to unity too! :smile:).

So I made a very simple game (Imgur: The magic of the Internet) where the player (cube) has to collect 5 coins that are randomly placed in the level.

I have gone through and made the academy, agent and area scripts. (I am using the same ml-agents version as the penguins tutorial for now (0.13.1) and using curriculum learning). This is in my yaml:

PlayerLearning:
summary_freq: 5000
time_horizon: 128
batch_size: 128
buffer_size: 128
hidden_units: 256
beta: 1.0e-2
max_steps: 1.0e6

and the JSON curriculum is this:

{ "measure": "reward","thresholds": [-0.1,0.7,1.7,1.7,1.7,2.7,2.7],"min_lesson_length": 80, "signal_smoothing": true,"parameters": {"coin_speed": [0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5]}}

I can move the player around fine using heuristics, although when I try and train, both elements in vectorAction (from overriding AgentAction) are 0, therefore the player is not moving.

Does anyone know why this is? Apologies if this is a stupid question, as I said, I’m new to unity and ml-agents. Thanks a lot.

Code: ML-Agents · GitHub

Images: Imgur: The magic of the Internet
Imgur: The magic of the Internet
Imgur: The magic of the Internet

Can you try to print directly the vectorAction from AgentAction (these should be floats)?
It seems strange that they would be all 0 in this case. Did you make sure that you disabled heuristics (it could be that your agents still use “heuristic only” in the scene)? Is the training process going normally? Is the reward moving at all?
Are you maybe using action masks?

Yep they are both 0. It is set to default (so not heuristic only). The reward is increasing, just the agent is not actully moving around (and I did test, moving works when heuristic only is on)

Could you open an issue on the github repository? Can you submit minimal steps to reproduce this bug (does it happen with an example environment or a toy environment?)