Hello, I’m trying to run a bunch of tests for some trained agents via a batch file. The problem is that runs don’t end at the max_steps, so it will only test the first agent-environment pair before I manually stop it.
The documentation says that max_steps is the total number of steps before ending training, so that is consistent with it never ending in inference mode. Is this intentional, and is there some other way to specify an end point in inference mode?
I think I can solve it by training with a learning rate of 0, but that seems a bit janky to me.
Anyway, here’s the output, I think you can ignore tensorflow/cuda errors, I don’t think those are related:
sh testing_single_area_offsets.sh
2021-03-12 13:41:58.069897: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /home/timjaris/anaconda3/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
▄▄▄▓▓▓▓
╓▓▓▓▓▓▓█▓▓▓▓▓
,▄▄▄m▀▀▀’ ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌
▄▓▓▓▀’ ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄
▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌
▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌
▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓
^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓'▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀
▀▀ ▀▀ '▀▀ ▐▓▓▌ ▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀
▀█▓▓▓▓▓▓▓▓▓▌
¬`▀▀▀█▓
Version information:
ml-agents: 0.17.0,
ml-agents-envs: 0.17.0,
Communicator API: 1.0.0,
TensorFlow: 2.3.0
Found path: /home/timjaris/Desktop/Unity_Builds/1area_offsets/USC.x86_64
2021-03-12 13:42:06 INFO [environment.py:108] Connected to Unity environment with package version 1.0.0-preview and communication version 1.0.0
2021-03-12 13:42:07 INFO [environment.py:265] Connected new brain:
Take Cover?team=0
2021-03-12 13:42:07.524523: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-12 13:42:07.550971: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3699850000 Hz
2021-03-12 13:42:07.551321: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5620e22b6050 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-12 13:42:07.551353: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-03-12 13:42:07.554740: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-03-12 13:42:07.560892: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE: forward compatibility was attempted on non supported HW
2021-03-12 13:42:07.561105: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: MLM
2021-03-12 13:42:07.562048: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: MLM
2021-03-12 13:42:07.562184: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.39.0
2021-03-12 13:42:07.562247: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.32.3
2021-03-12 13:42:07.562258: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 460.32.3 does not match DSO version 460.39.0 – cannot find working devices in this configuration
2021-03-12 13:42:07 WARNING [stats.py:197] events.out.tfevents.1615499247.MLM.meta was left over from a previous run. Deleting.
2021-03-12 13:42:07 WARNING [stats.py:197] events.out.tfevents.1615499247.MLM was left over from a previous run. Deleting.
2021-03-12 13:42:07 INFO [stats.py:130] Hyperparameters for behavior name Take Cover:
trainer_type: ppo
hyperparameters:
batch_size: 1024
buffer_size: 10240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: True
hidden_units: 256
num_layers: 3
vis_encode_type: simple
memory: None
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
init_path: results/1area_offsets/Take Cover
keep_checkpoints: 5
checkpoint_interval: 5000
max_steps: 5000
time_horizon: 64
summary_freq: 2500
threaded: True
self_play: None
behavioral_cloning: None
2021-03-12 13:42:09 INFO [tf_policy.py:165] Loading model for brain Take Cover?team=0 from results/1area_offsets/Take Cover.
2021-03-12 13:42:09 INFO [tf_policy.py:191] Starting training from step 0 and saving to results/test:1area_offsets_base_train_set/Take Cover.
2021-03-12 13:42:51 INFO [stats.py:111] Take Cover: Step: 2500. Time Elapsed: 52.081 s Mean Reward: 9.945. Std of Reward: 0.030. Not Training.
2021-03-12 13:43:40 INFO [stats.py:111] Take Cover: Step: 5000. Time Elapsed: 101.061 s Mean Reward: 9.943. Std of Reward: 0.027. Not Training.
2021-03-12 13:43:40 INFO [rl_trainer.py:151] Checkpointing model for Take Cover.
2021-03-12 13:44:25 INFO [stats.py:111] Take Cover: Step: 7500. Time Elapsed: 145.979 s Mean Reward: 9.942. Std of Reward: 0.029. Not Training.
2021-03-12 13:45:03 INFO [stats.py:111] Take Cover: Step: 10000. Time Elapsed: 183.770 s Mean Reward: 9.941. Std of Reward: 0.030. Not Training.
2021-03-12 13:45:03 INFO [rl_trainer.py:151] Checkpointing model for Take Cover.
2021-03-12 13:45:38 INFO [stats.py:111] Take Cover: Step: 12500. Time Elapsed: 218.963 s Mean Reward: 9.944. Std of Reward: 0.030. Not Training.
2021-03-12 13:46:26 INFO [stats.py:111] Take Cover: Step: 15000. Time Elapsed: 266.651 s Mean Reward: 9.942. Std of Reward: 0.030. Not Training.
2021-03-12 13:46:26 INFO [rl_trainer.py:151] Checkpointing model for Take Cover.
2021-03-12 13:47:06 INFO [stats.py:111] Take Cover: Step: 17500. Time Elapsed: 307.183 s Mean Reward: 9.947. Std of Reward: 0.027. Not Training.
2021-03-12 13:47:46 INFO [stats.py:111] Take Cover: Step: 20000. Time Elapsed: 346.459 s Mean Reward: 9.947. Std of Reward: 0.025. Not Training.
2021-03-12 13:47:46 INFO [rl_trainer.py:151] Checkpointing model for Take Cover.
^C