I am having two problems when training.
First, I am unable to watch the training reward process with tensorboard and the updated on the earned reward is not printing to the console. I think that this may be connected to their being no summaries folder being generated. (I am basing this off of the hummingbird example and how he did it, so let me know if things have just changed since then because I was able to get tensorboard to load and show output using results instead of summaries.)
Second, after I run the test for a couple of minutes I get this error:
c:\users\capstone.conda\envs\ml-agents-node\lib\site-packages\mlagents\trainers\torch\utils.py:242: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at …\torch\csrc\utils\python_arg_parser.cpp:882.)
res += [data[(partitions == i).nonzero().squeeze(1)]]
If I let the training to proceed it trains fine for about 10 minutes python crashes, so I have to stop training and restart it. My configuration file is:
behaviors:
Node_AI:
trainer_type: sac
summary_freq: 50000
time_horizon: 128
max_steps: 5.0e6
keep_checkpoints: 5
checkpoint_interval: 500000
init_path: null
threaded: true
hyperparameters:
learning_rate: 3e-4
batch_size: 100 #this is a guess avg is 32 - 512
buffer_size: 50000
learning_rate_schedule: constant
buffer_init_steps: 0
init_entcoef: 0.5
save_replay_buffer: true
tau: 0.005
steps_per_update: 1
network_settings:
hidden_units: 256
num_layers: 2 #typical is 1 - 3
normalize: false
vis_encoder_type: match3
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
curiosity:
strength: 0.05
gamma: 0.99
self_play:
save_steps: 20000
team_change: 80000
swap_steps: 5000
play_against_latest_model_ratio: 0.5
window: 10