Description
When I run training for the Wall Jump Example in the ml-agents-release1 folder,
mlagents-learn config/trainer_config.yaml --run-id=WallJump2 --force
and press the play button, the training starts like usual, but everything comes to a stop in about 30 seconds. The agent is floating in midair, the Unity window stops responding, and the Command Prompt does not have any more output. 40% CPU usage is taken by a Python process during this period. Ctrl-C in the Command Prompt causes the Unity window to unfreeze, but the Python process still runs in the Command Prompt (consuming 40% still). The last line in the CMD output is after I do the Ctrl-C. I have to end the process from Task Manager for it to stop.
Any idea what could be going wrong? I use MLAgents release 1 as downloaded from the GitHub page.
Versions
Unity: 2019.3.13f1
Python: 3.7.7
ml-agents: 0.16.0,
ml-agents-envs: 0.16.0,
Communicator API: 1.0.0,
TensorFlow: 2.1.0
CMD output
mlagents-learn config/trainer_config.yaml --run-id=WallJump2 --force
2020-05-30 21:29:10.318307: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
WARNING:tensorflow:From C:\Users\nihal\anaconda3\envs\tf_gpu\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
βββββββ
βββββββββββββ
,βββmββββ ,βββββββ βββ βββ
ββββββ ββββ βββ ββ ββ ,ββ ββββ ,ββ βββββ βββ ,ββ
βββββ ββββ ββββ βββ βββ ββββββββββ βββ βββββ ^βββ ββββ
βββββββββββββββββ ββ βββ βββ βββ βββ βββ βββ ββββ βββ
ββββββββββββββββββ ββ βββ βββ βββ βββ βββ βββ ββββββ
^ββββ ββββ ββββ βββββββββ βββ βββ βββ ββββ ββββ'βββββ ^βββ βββ βββββ ββ ^ββ
ββ ββ 'ββ ββββ ββββββββ ββββββ, βββββ
ββββββββββββ
Β¬`βββββ
Version information:
ml-agents: 0.16.0,
ml-agents-envs: 0.16.0,
Communicator API: 1.0.0,
TensorFlow: 2.1.0
2020-05-30 21:29:13.318042: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
WARNING:tensorflow:From C:\Users\nihal\anaconda3\envs\tf_gpu\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-05-30 21:29:15 INFO [environment.py:201] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
2020-05-30 21:29:20 INFO [environment.py:111] Connected to Unity environment with package version 1.0.0-preview and communication version 1.0.0
2020-05-30 21:29:20 INFO [environment.py:342] Connected new brain:
SmallWallJump?team=0
2020-05-30 21:29:20.729678: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-05-30 21:29:20.739893: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-05-30 21:29:20.776772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-05-30 21:29:20.786215: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-30 21:29:20.795884: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-30 21:29:20.805575: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-30 21:29:20.812470: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-30 21:29:20.825955: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-30 21:29:20.834143: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-30 21:29:20.846617: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-30 21:29:20.853374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-30 21:29:21.471889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-30 21:29:21.477498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-05-30 21:29:21.481310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-05-30 21:29:21.485303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4625 MB memory) β physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-05-30 21:29:21 WARNING [stats.py:197] events.out.tfevents.1590851932.LAPTOP-3CHHMIT0 was left over from a previous run. Deleting.
2020-05-30 21:29:21 WARNING [stats.py:197] events.out.tfevents.1590851971.LAPTOP-3CHHMIT0 was left over from a previous run. Deleting.
2020-05-30 21:29:21 INFO [stats.py:130] Hyperparameters for behavior name WallJump2_SmallWallJump:
trainer: ppo
batch_size: 128
beta: 0.005
buffer_size: 2048
epsilon: 0.2
hidden_units: 256
lambd: 0.95
learning_rate: 0.0003
learning_rate_schedule: linear
max_steps: 5e6
memory_size: 128
normalize: False
num_epoch: 3
num_layers: 2
time_horizon: 128
sequence_length: 64
summary_freq: 20000
use_recurrent: False
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
summary_path: WallJump2_SmallWallJump
model_path: ./models/WallJump2/SmallWallJump
keep_checkpoints: 5
2020-05-30 21:29:21.522400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-05-30 21:29:21.533581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-30 21:29:21.538149: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-30 21:29:21.542434: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-30 21:29:21.547138: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-30 21:29:21.551515: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-30 21:29:21.556379: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-30 21:29:21.561277: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-30 21:29:21.566907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-30 21:29:21.569910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-30 21:29:21.574760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-05-30 21:29:21.577636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-05-30 21:29:21.580530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4625 MB memory) β physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-05-30 21:29:23.056474: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-30 21:29:23 INFO [environment.py:342] Connected new brain:
BigWallJump?team=0
2020-05-30 21:29:23 WARNING [env_manager.py:109] Agent manager was not created for behavior id BigWallJump?team=0.
2020-05-30 21:29:23.572885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-05-30 21:29:23.582329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-30 21:29:23.587127: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-30 21:29:23.591483: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-30 21:29:23.596612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-30 21:29:23.601308: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-30 21:29:23.606158: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-30 21:29:23.610582: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-30 21:29:23.616124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-30 21:29:23.619079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-30 21:29:23.623984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-05-30 21:29:23.626724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-05-30 21:29:23.630220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4625 MB memory) β physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-05-30 21:29:23 WARNING [stats.py:197] events.out.tfevents.1590851934.LAPTOP-3CHHMIT0 was left over from a previous run. Deleting.
2020-05-30 21:29:23 WARNING [stats.py:197] events.out.tfevents.1590851973.LAPTOP-3CHHMIT0 was left over from a previous run. Deleting.
2020-05-30 21:29:23 INFO [stats.py:130] Hyperparameters for behavior name WallJump2_BigWallJump:
trainer: ppo
batch_size: 128
beta: 0.005
buffer_size: 2048
epsilon: 0.2
hidden_units: 256
lambd: 0.95
learning_rate: 0.0003
learning_rate_schedule: linear
max_steps: 2e7
memory_size: 128
normalize: False
num_epoch: 3
num_layers: 2
time_horizon: 128
sequence_length: 64
summary_freq: 20000
use_recurrent: False
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
summary_path: WallJump2_BigWallJump
model_path: ./models/WallJump2/BigWallJump
keep_checkpoints: 5
2020-05-30 21:29:23.658786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-05-30 21:29:23.668717: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-30 21:29:23.672988: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-30 21:29:23.678131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-30 21:29:23.682779: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-30 21:29:23.687796: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-30 21:29:23.692050: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-30 21:29:23.697388: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-30 21:29:23.702211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-30 21:29:23.705757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-30 21:29:23.710373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-05-30 21:29:23.713525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-05-30 21:29:23.717303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4625 MB memory) β physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-05-30 21:30:12 INFO [subprocess_env_manager.py:191] UnityEnvironment worker 0: environment stopping.