Hi, I came across this trying to find an even more detailed explanation of what exactly buffer size is. But now I think I am unclear about experiences itself. This is specially with regards to PPO.
From my understanding, an experience is a step with a query to the policy and not just a regular agent step. What I mean is that if the “Max Step” parameter in the Agent’s Script parameters is set to suppose 2500 - this refers to the number of actions the agent performs per 1 episode. This is analogous to how many times fixedUpdate is called. However, with the introduction of the “Decision Period” parameter in the Decision Requester component, things get a little more confusing. Suppose Decision Period is set to 5, this means that after every 5 actions, the policy is actually queried to get a policy action and then consequently a reward. This makes it so that 2500/5 = 500 steps are the “true” steps that are actually “experiences”. Am I right in thinking this?
So assuming I am right, we have 500 experiences per episode. Out of these 500 experiences per episode only previous “time_horizon” amount of them contribute to the outcome of that episode. Am I right about the time_horizon part?
Now, after “buffer_size” amount of experiences, the policy is actually updated. So if my buffer_size is 30000, then after 30000 experiences or 30000/500 = 60 episodes, the policy is updated. I am assuming this buffer is called the experience buffer.
Next, “batch_size” is another challenging thing to understand. We split our “buffer_size” experiences into n “batch_size” batches of experiences to “update” the policy n times. So if the batch_size is 300 and buffer_size is 30000, we do 30000/300 = 100 “iterations” of gradient descent. Now I don’t understand where “num_epoch” comes into play here or what it’s purpose is.
Next question I have is, how do I calculate the memory of my experience buffer ? I keep getting errors related to “sequence_length invalid” or “broken pipe” or " UnityEnvironment worker 0: environment raised an unexpected exception." when I try to increase my buffer_size >= 8192. I know increasing buffer size can lead to more “RAM? VRAM?” consumption but I believe this is a relatively small buffer size and I should not be getting these errors. I will post the error logs below but before that I want to clarify the memory calculation.
Memory(Experience buffer) = Memory(Observations + Actions + Reward) * buffer_size
Is this correct?
In my scenario, I just want a car to put a ball in the goal.
Car has 2 continuous input actions:
- Throttle - Forward / Backward Acceleration - Float
- Steering Direction - Float
Mem(Actions) = 4 + 4 = 8 Bytes
Observations:
- 32 x 32 Grayscale FPS Visual Observation
- Single Raycast Distance - Float
- Current Steering Direction - Float
- Current Throttle - Float
Mem(Observation) = (32 x 32) + 4 + 4 + 4 = 1024 + 4 + 4 + 4 = 1036 Bytes
Rewards:
- Discrete reward when Car makes contact with Ball
- Discrete reward when Ball makes contact with Goal
- Inverse Distance Squared from Car to Ball (cutoff after contact with ball)
- Inverse Distance Squared from Ball to Goal (starts after contact with ball)
Mem(Rewards) = 4 (since all are added together to one float)
Taking these into the equation:
Mem(Experience Buffer) = (8 + 1036 + 4) * 8192 = 8,585,216 bytes = 8.5 MB
If this is true, then I should be having no problem with my 16 GB RAM and Nvidia 3070 Ti with dedicated 8 GB VRAM. I am stating both because I fail to understand still how to properly utilize the GPU during mlagents training due to the poor documentation on this subject. The only thing I am doing to utilize my GPU right now is adding --torch-device=cuda to my mlagents-learn command. I have of course downloaded pytorch that is built with cuda and made sure to get the corresponding CUDA toolkit version. I have no idea where this experience buffer is being stored. I checked task manager and that was pretty unhelpful too.
I would really appreciate it if someone could clarify these for me.
Error Logs from my latest run batch_size 1024 and buffer_size 10240:
(mlagents) C:\Users\Anurag\ml-agents-latest_release>mlagents-learn config/Car2Ball_visual_curiosity_config_v3.yaml --run-id=test3_1024_10240 --torch-device=cuda --resume
┐ ╖
╓╖╬│╡ ││╬╖╖
╓╖╬│││││┘ ╬│││││╬╖
╖╬│││││╬╜ ╙╬│││││╖╖ ╗╗╗
╬╬╬╬╖││╦╖ ╖╬││╗╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╜╜╜ ╟╣╣
╬╬╬╬╬╬╬╬╖│╬╖╖╓╬╪│╓╣╣╣╣╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╒╣╣╖╗╣╣╣╗ ╣╣╣ ╣╣╣╣╣╣ ╟╣╣╖ ╣╣╣
╬╬╬╬┐ ╙╬╬╬╬│╓╣╣╣╝╜ ╫╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╟╣╣╣╙ ╙╣╣╣ ╣╣╣ ╙╟╣╣╜╙ ╫╣╣ ╟╣╣
╬╬╬╬┐ ╙╬╬╣╣ ╫╣╣╣╬ ╟╣╣╬ ╟╣╣╣ ╟╣╣╬ ╣╣╣ ╣╣╣ ╟╣╣ ╣╣╣┌╣╣╜
╬╬╬╜ ╬╬╣╣ ╙╝╣╣╬ ╙╣╣╣╗╖╓╗╣╣╣╜ ╟╣╣╬ ╣╣╣ ╣╣╣ ╟╣╣╦╓ ╣╣╣╣╣
╙ ╓╦╖ ╬╬╣╣ ╓╗╗╖ ╙╝╣╣╣╣╝╜ ╘╝╝╜ ╝╝╝ ╝╝╝ ╙╣╣╣ ╟╣╣╣
╩╬╬╬╬╬╬╦╦╬╬╣╣╗╣╣╣╣╣╣╣╝ ╫╣╣╣╣
╙╬╬╬╬╬╬╬╣╣╣╣╣╣╝╜
╙╬╬╬╣╣╣╜
╙
Version information:
ml-agents: 1.0.0,
ml-agents-envs: 1.0.0,
Communicator API: 1.5.0,
PyTorch: 1.13.1+cu117
[WARNING] Training status file not found. Not all functions will resume properly.
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
[INFO] Connected to Unity environment with package version 3.0.0-exp.1 and communication version 1.5.0
[INFO] Connected new brain: Car2Ball?team=0
[INFO] Hyperparameters for behavior name Car2Ball:
trainer_type: ppo
hyperparameters:
batch_size: 1024
buffer_size: 10240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
shared_critic: True
learning_rate_schedule: linear
beta_schedule: constant
epsilon_schedule: linear
checkpoint_interval: 500000
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
curiosity:
gamma: 0.99
strength: 0.02
network_settings:
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
learning_rate: 0.003
encoding_size: None
init_path: None
keep_checkpoints: 5
even_checkpoints: False
max_steps: 30000000
time_horizon: 128
summary_freq: 50000
threaded: True
self_play: None
behavioral_cloning: None
[INFO] Resuming from results\test3_1024_10240\Car2Ball.
[INFO] Resuming training from step 499978.
[INFO] Car2Ball. Step: 500000. Time Elapsed: 6.339 s. No episode was completed since last summary. Training.
[INFO] Exported results\test3_1024_10240\Car2Ball\Car2Ball-499978.onnx
[INFO] Car2Ball. Step: 550000. Time Elapsed: 46.842 s. Mean Reward: 166.833. Std of Reward: 85.181. Training.
[INFO] Car2Ball. Step: 600000. Time Elapsed: 89.331 s. Mean Reward: 154.484. Std of Reward: 67.336. Training.
[INFO] Car2Ball. Step: 650000. Time Elapsed: 131.203 s. Mean Reward: 140.996. Std of Reward: 85.288. Training.
[INFO] Car2Ball. Step: 700000. Time Elapsed: 173.653 s. Mean Reward: 152.901. Std of Reward: 66.126. Training.
[INFO] Car2Ball. Step: 750000. Time Elapsed: 217.055 s. Mean Reward: 146.363. Std of Reward: 74.872. Training.
[INFO] Car2Ball. Step: 800000. Time Elapsed: 256.871 s. Mean Reward: 148.012. Std of Reward: 72.254. Training.
[INFO] Car2Ball. Step: 850000. Time Elapsed: 298.509 s. Mean Reward: 152.311. Std of Reward: 93.526. Training.
[INFO] Car2Ball. Step: 900000. Time Elapsed: 340.512 s. Mean Reward: 147.693. Std of Reward: 92.437. Training.
[INFO] Car2Ball. Step: 950000. Time Elapsed: 382.421 s. Mean Reward: 152.774. Std of Reward: 66.762. Training.
Exception in thread Thread-2 (trainer_update_func):
Traceback (most recent call last):
[ERROR] UnityEnvironment worker 0: environment raised an unexpected exception.
Traceback (most recent call last):
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py”, line 312, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] The pipe has been ended
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py”, line 175, in worker
req: EnvironmentRequest = parent_conn.recv()
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py”, line 250, in recv
buf = self._recv_bytes()
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py”, line 321, in _recv_bytes
raise EOFError
EOFError
Process Process-1:
Traceback (most recent call last):
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py”, line 312, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] The pipe has been ended
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py”, line 175, in worker
req: EnvironmentRequest = parent_conn.recv()
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py”, line 250, in recv
buf = self._recv_bytes()
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py”, line 321, in _recv_bytes
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\process.py”, line 314, in _bootstrap
self.run()
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py”, line 235, in worker
_send_response(EnvironmentCommand.ENV_EXITED, ex)
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\site-packages\mlagents\trainers\subprocess_env_manager.py”, line 150, in _send_response
parent_conn.send(EnvironmentResponse(cmd_name, worker_id, payload))
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py”, line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File “C:\Users\Anurag\miniconda3\envs\mlagents\lib\multiprocessing\connection.py”, line 280, in _send_bytes
ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed
Thanks
Anurag