Here is the environment I am working on to give you an idea:
(now the jump flying is fixed
)
I realized during a hyperparameter sweep that the number of agents and Time.TimeScales affects the final results immensely, with reward performance suffering when either one is too big.
This suggested to me that whenever there are some resource constraints on the machine this might affect the training. This went against my intuition so I decided to test it.
Below are two groups of runs. SAC_SingleProcess means that I am only running one python training instance. (3 seeds are run sequentially), and SAC_MultiProcessOverTaxes is where I am running 8 python instances in parallel. The environment and the hyperparameters are identical.
_MeanTrainingSuccessRate is the average success (defined as agent reaching the goal instead of timing out) of the last 250 episodes, and the final success rate is the average success rate of 1000 episodes with deterministic sampling. The timescale is set to a reasonable 5, and there are 8 agents in the scene.
I am not using a physics-based character controller (I know the physics simulation can go wild at high timescales) but instead, my agent is using the controller provided in the MicroFPS template, which in turn uses the CharacterController.Move().
Based on this observation I have a few questions:
-
Is it possible that some of the DecisionRequests are being dropped? By that I mean the python process is too slow While I believe the UnityEnv waits for the response from the python side before continuing the simulation I want to confirm my understanding.
-
Is it possible that CharacterController.Move() acts unexpectedly when the process is resources starved?
-
Should I be multiplying characterVelocity with * Time.deltaTime when using the
m_Controller.Move(characterVelocity * Time.deltaTime); as shown here?(And why is gravity multiplied twice…?) -
Any other aspects that you think I should check?
Thank you for the cool framework and sorry for the flurry of questions!