Why if I have an agent with, for example, 3000 max steps, if I run my training with my 144 multi-agent environment, after 30000 steps I do not get a episode. How is the realtion between that? I know time-horizon has something to do, but there has to something more like with more number of agents more steps
For Agent.MaxStep, this is defined in terms of Academy steps (which by default corresponds to FixedUpdate calls in the engine).
For the purposes of summary_freq, max_steps, etc. in the trainer configuration, these refer to agent steps, not academy steps. So you would need 144*3000 of these steps to ensure every agent completes an episode.
and, how can give those 1443000 steps, because normally it stops before? Should I change the time_horizon, or just are just saying that its better to see the episodes 1443000 steps but with the config that I have now?
By the way, they always have to run the 3000 steps per agent.
time_horizon won’t affect your episode completion, it will still wait for an episode to actually finish before recording the reward. I’d set the max_steps in your trainer configuration to MUCH more than 1443000, e.g. 30,000,000 or more. Then, set your summary_freq parameter to a number greater than 1443000. You should now see periodic summaries of reward during your training. You can quit training early by hitting CTRL+C, even if your trainer max_steps is a large number.
Thank you soooo much guys!!