More info needed in the docs regarding torch_settings: device:

EDIT:
I found the reason but couldn’t find any reference to this in the documentation in the git, if any Unity dev see’s this, pleas can you shed some light or add more info in the docs please?
So i’m changing the title tag to - feedback

ORIGINAL:
I’m training with a server build (on my PC) with the --no-graphics command, but my GPU is still being utilized at +90%.
I don’t have visual observations.
I didn’t set ML Agents to train on the GPU, and my agents are set up to use Burst.

Is this normal?
Is ML Agents training on the GPU automatically?

I’m not sure why you’re getting GPU usage on a dedicated server build, but one thing I have noticed is that the Update rate is really high on my own dedicated server builds (200-300fps), and uses up all available CPU. I usually drop Update rate to about 10fps, by setting Application.targetFrameRate`, e.g. see peaceful-pie/PeacefulPie/Simulation.cs at 442b2a9ffce43ab0923ad642176a46eb53457e64 · hughperkins/peaceful-pie · GitHub (this code also shows one way to detect when running as a dedicated server, i.e. peaceful-pie/PeacefulPie/Simulation.cs at 442b2a9ffce43ab0923ad642176a46eb53457e64 · hughperkins/peaceful-pie · GitHub )

1 Like

I found why,
the config file torch settings were set to

torch_settings:
device: null

which for some reason made the trainer use the GPU, I searched the git repo for this settings but only found this page that doesn’t say much.

if anyone knows where can I get more info about this it would be appreciated!

Ah, torch is the library that’s used for running the neural network. You probably want that to be using the gpu, if you have one: it should run faster.

I’m not sure what inforation you are looking for for this, but eg the pytorch doc on device is here: Tensor Attributes — PyTorch 2.4 documentation. I think that setting it to null making it use the gpu is probably a Unity thing. (normally, it defaults to cpu, in torch, I think, but defaulting to gpu definitely makes sense, for best performance)

Thanks!!

I found a few references on line that states that using the GPU with ppo doesn’t really make a difference, sometimes even worse.
2 examples

I read some more in the past

Depends on the size of your network. But yeah, with a few rays as input, and using a small stack of Linear layers for the network, gpu is not going to change much.

If you start feeding images into your network, and you start using convolutional layers, then gpu becomes more useful.

Not that you will get better results using images - in fact, everything will just learn much more slowly - but depends on what you are trying to do.

Oh, I see, as you say, looks like the mlagents implementation of PPO is not optimized for GPU, Using CPU vs GPU in training with ML-Agents · Issue #1246 · Unity-Technologies/ml-agents · GitHub. Interesting.

I’m updating this again because I did some tests to see if it is faster with the CPU or GPU settings,
I noticed a significant increase in performance with the GPU, the CPU setting was much slower to update the policy (I didn’t save statistics).

just for reference the agent network size is 400 units & 2 hidden layers
It has a total of 649 observations inputs
and 1 bool & x2 Vector3 outputs

Interesting. Glad that GPU does work on mlagents PPO :slight_smile:

My own experience is that for small networks, yes, the learning phase of the policy will be slightly faster on GPU than on CPU, but still fairly small relative to time to run the game. With Nature-CNN sized networks, e.g. stable-baselines3/stable_baselines3/common/torch_layers.py at bea3c44ba52278ec755af0179859b04ab80cdcaf · DLR-RM/stable-baselines3 · GitHub , then running on CPU becomes prohibitively slow (3-5 minutes pauses per learning phase…), and GPU becomes vastly preferable.

(actually, re-reading this, I guess 649 inputs is quite a lot :slight_smile: )