I’m training a self-driving AI car but the training stops every time buffer_size steps is reached (i.e. if buffer_size is 1000, then the training freezes at 1000 steps, 2000 steps, 3000 steps …). After 5 or 8 minutes later, the training will begin normally again until the next buffer_size is reach, and the process is repeated.
To my understanding, buffer_size is when the model is being updated. Does this mean that my model is simply too large (20x20 grid sensor with 3 tags with 2 stacks and some more raycasts), or can I improve it with my hyperparameters?
Here are my hyperparameters:
behaviors:
CarAgentFollow:
trainer_type: ppo
hyperparameters:
# Hyperparameters common to PPO and SAC
batch_size: 4096
buffer_size: 65536
learning_rate: 3.0e-4
learning_rate_schedule: linear
# PPO-specific hyperparameters
# Replaces the "PPO-specific hyperparameters" section above
beta: 5.0e-3
beta_schedule: linear
epsilon: 0.2
epsilon_schedule: linear
lambd: 0.9
num_epoch: 13
# Configuration of the neural network (common to PPO/SAC)
network_settings:
vis_encode_type: simple
normalize: true
hidden_units: 128
num_layers: 2
# Trainer configurations common to all trainers
max_steps: 3.5e6
time_horizon: 512
summary_freq: 10000
keep_checkpoints: 5
checkpoint_interval: 40000
threaded: true
init_path: null
reward_signals:
# environment reward (default)
extrinsic:
strength: 1.0
gamma: 0.99
# curiosity module
curiosity:
strength: 0.01
gamma: 0.99
learning_rate: 3.0e-4
environment_parameters:
levels:
curriculum:
- name: ObstaclesDodge_Easy
completion_criteria:
measure: reward
behavior: CarAgentFollow
signal_smoothing: true
threshold: 4.75
min_lesson_length: 100
value:
sampler_type: uniform
sampler_parameters:
min_value: 1
max_value: 2
- name: ObstaclesDodge_Medium
completion_criteria:
measure: reward
behavior: CarAgentFollow
signal_smoothing: true
threshold: 4.65
min_lesson_length: 100
value:
sampler_type: uniform
sampler_parameters:
min_value: 2
max_value: 3
- name: ObstaclesDodge_Hard
completion_criteria:
measure: reward
behavior: CarAgentFollow
signal_smoothing: true
threshold: 4.55
min_lesson_length: 100
value:
sampler_type: uniform
sampler_parameters:
min_value: 3
max_value: 5
- name: ObstaclesDodge_Expert
value:
sampler_type: uniform
sampler_parameters:
min_value: 5
max_value: 8
Any help will be greatly appreciated