Hyperparameters tuning in ML-Agents

Hi there,

Is there a good method to tune the hyperparameters using ML-Agents. I know the docs suggest some of the values and I can just try out some combinations, one by one. But maybe there is a way to automate this process and test all possible scenarios (from a limited amount of values of course). For example, I would like to test a few hyperparameters for three different values each. Am I wrong or the only way to do this (the easiest way I found) is to create a learning environment executable, prepare A LOT of configuration files and just run all of these trainings in the python virtual environment? Or maybe there is an easier way to test all these values?

Thanks!

This is hopelessly outdated (GitHub - mbaske/ml-agents-hyperparams: Automated Hyperparameter Search for Unity ML-Agents), but maybe the basic idea still has some merit. Which is to write a batch runner that launches mlagents-learn python processes. Back then, I hacked trainer_controller.py and injected the hyperparameters directly. Although simply generating a bunch of config.yaml files automatically upfront should be a better approach. An advantage of hooking into the trainer_controller was that I could check for stop conditions being met. Iā€™m not sure how you could do that with a batch runner which merely launches processes.

Btw, I have no idea why all the ML-Agents folks are listed as contributors in my repo. I must have done some weird forking and re-basing at some point, and github somehow copied them over.

So essentially, if I prepare some configs and I run the learning process through an executable (I can then launch all processes one by one), I will achieve the same result? Which is to have these trainings listed in Tensorboard so I can compare them? Iā€™m pretty new to this topic so Iā€™m wondering if my idea will give me similar results to what these search algorithms do.

Regarding running mlagents-learn processes, I assume you can do so by pasting the commands in CMD/Powershell and put ā€˜&&ā€™ in between them, right?

Well, I thought it would be nice to have some front end where you can specify the search, like ā€œdo a grid search for hyperparameters x and yā€, and then have the batch runner create the config files and launch python for you. Perhaps one could use Tensorboardā€™s HTTP API for tracking the training progress and conditional early stopping.
https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/scalar/http_api.md
Havenā€™t looked into CMD/Powershell yet though.

@mbaske When you were running your hyperparameter tuner, was the whole process taking a long time? I mean, if I were to test many different combinations of parameters for like 10 times, it would take probably days, considering one training session can last around 10 minutes. Iā€™m asking because if I understand correctly, if I were to do a grid-search on my own for three parameters and three values for each of the parameters, I would end up having 27 training runs (x10 is 270ā€¦). Do you know how I could speed up this process? Lesser max_step value?

Also, is there a different way of obtaining all the training data that gets fed to TensorBoard like average reward, step, etc? Currently, I would have to download each training result to a CSV file, convert it to .xlsx and get for example an average reward score for that run. This would take a very long timeā€¦

@ I have just noticed there are two .json files located in the results folder after running trainings where you can see all the mean, max and min values of all the data involved in the trainingā€¦ I am learning something new everyday :slight_smile: So you can disregard my second question :stuck_out_tongue:

@Wolf00007 Youā€™re right, total training time can increase exponentially for grid searches. I was hoping that setting stop conditions could filter out training runs which produce bad results early. Iā€™m currently looking into updating the project and think itā€™s possible to get it working without having to change any of the ml-agents files.
You can get all training scalars from the TensorBoard HTTP API btw. A separate python process could query the API and write the metrics youā€™re interested in to a file.

@mbaske Sounds cool, please let me know in this thread if you manage to finish updating your project :slight_smile:

Think I got it working now - except for stopping processes nicely on Windows. Please let me know if you find any issues.
mbaske/ml-agents-hyperparams: Automated Hyperparameter Search for Unity ML-Agents (github.com)

@mbaske Looks good. I see youā€™re using the values that get generated in TensorBoard. I found something weird when it comes to the timers.json file in run_logs folder. I was mainly looking there to see the average reward for the whole training run. But the values there seem to be incorrect sometimes:

7359863--896486--upload_2021-7-25_14-57-48.png

How can the mean reward be the same as the lowest reward received? Are these values bugged? For some instances I also got mean rewards that did not look correct (when compared to their graphs in TensorBoard. Do you have the same problem in your training runs? Or do you only look at the graphs and decide on hyperparameters based on the final reward received?

Other example: I did two runs for the same config file, one after the other. The first run showed mean cumulative reward as 3.77256006114184 in run_logs/timers.json but the second one showed it as -0.9996000453829765 which makes no sense.

Thanks, I havenā€™t looked into run_logs/timers.json in detail yet, but I agree it looks a bit weird. For one of my runs, Iā€™m seeing ā€œvalueā€ being the same as ā€œmaxā€ for an Environment.CumulativeReward.mean. Not sure how to interpret that really. For my batch runner, Iā€™m only relying on the TensorBoard scalar values. Mainly because I wanted to track the training progess and abort runs when they meet some stop condition. I donā€™t think this would be possible with timers.json, AFAIK itā€™s only written when training is complete.

Are you looking at the graphs only and picking the runs with the highest consistent rewards? Or do you look at the last reward only?

Iā€™m only checking the latest value for whatever TensorBoard tag is set in the a stop condition (my yaml opt_stop param). Doesnā€™t have to be rewards necessarily, could just as well be some custom metric youā€™re sending via the StatsRecorder. But yeah, itā€™s basically a ā€˜dumbā€™ grid search, because it doesnā€™t do any evaluation of the overall training performance. Should be interesting though to dynamically pick or even generate config params, in order to home in on the best value combinations. Well, maybe some other time.

1 Like

Hi mbaske,
that is a very interesting project. Is this still usable in current versions of ML-Agents (Release 19-20 of ML-Agents)?

Hi - I havenā€™t used this for a while, and donā€™t have Unity/ML-Agents set up right now. My guess is it should still work, canā€™t make any promises though.

1 Like