Tips/Advice/Guidance on Unity ML Agent Toolkit

Im an undergrad working on a project and Im way behind schedule now so I desperately need some help here, i just started learning on how to use Unity and Unity ML Agents (Build More Engaging Games with ML Agents | Unity) the moment I chose to do this project and a few months had passed and I am still pretty much clueless and I only have 2 months left till the deadline.
I have changed my project a couple of times over the past few months to make it easier for me to handle within the given time so I arrived on a final decision which is to just make a 3D first person shooter in which the only player is the ML agent, the game is basically a maze with 5 “treasures” that the ML agent needs to collect and there are “zombies” randomly spawning and the agent either needs to evade it or kill it, main point is not to die.
Basically, what it needs to do is to:

  1. search and collect all the “treasures”
  2. search for a way out of the maze
  3. survive while doing the above 2 by either killing the zombies coming its way or somehow evade the zombies

Im using curiosity-driven PPO reinforcement learning(curiosity as intrinsic reward to encourage exploration, not really sure if its a good idea), and im not sure whether im jz not looking hard enough or maybe im jz bad at searching for helpful information but im jz having a hard time getting sources that I need as guidance. so if there is anyone out there that has done something similar or doing something similar and making decent progress would you mind sharing any good sources or materials that can guide me through this project especially :

  1. any online sources on making effective reward system for the RL agent
  2. online sources on how to train my agent efficiently and effectively
  3. efficient ways on how I can automate the tuning of the hyperparameters

All of which are within the scope of unity ml agents toolkit. I think my biggest issue is that my entire project basically depends on Unity ML Agent and yet Im still having troubles with some of the basics and basically still not “comfortable” enough with the toolkit to know my ways around it and use it to do what I want to do. Any further tips/advice/guidance on getting a hang of using unity ml agents toolkit would be greatly appreciated.

TLDR: I need specific sources or materials online to help me on mastering Unity ML agent toolkit, or at least be sufficiently skilled to do my project which is mentioned in the second paragrah above

Hey there and welcome to the forum,

first up the main problems: Imo none of your questions are really answerable. Here’s why for each question:

  1. There is no real “this is how you have to do a reward system”. Each problem is unique in its own way and so each reward system has to be designed specifically with the problem in mind. More regarding this further down.

  2. Only thing regarding this i can recommend: Set up your project so that multiple agents can operate/learn in the same scene. Don’t add fancy graphics. Don’t add complex collider forms. Also use the profiler to check what takes up performance so that you can optimize your code as much as possible.

  3. This simply does not exist. You simply need to stick to recommended default values to start with (there’s a guide on how to choose starting parameters somewhere in the documentation) and then try out some things to see which works best. There is no automation for this.

What i think that you completly miss here is a really really cruicial issue which is even more important than setting rewards for actions:

Designing Observations.

Depending on what you choose as observations your agent might learn to solve an issue or fail miserably, even if your hyperparameters and reward system is perfect.

The reason for this is that observations have to be chosen as “relative” observations to your agent. For example a good observation is some distance to the side/front and so on. A bad observation would be a global position. Why? Because if we imagine an agent which has the task to just find the exit to a maze, then this agent will learn to solve this specific maze as it learns to do certain things in certain positions. But if we now change the maze then the agent will only know how to solve the old maze. It will have no clue how to actually solve a maze as it only learned certain actions it had to do in certain specific global positions.

So given this information I can only heavily recommend you extend your question by your current chosen observation set as this is crucial to solve the task.

In addition you should consider (if possible) to simplify your task. Start without the treasure. This basically has nothing to do with ML-Agents specifically but with Reinforcement Learning in general. With these Bonus objectives you add a whole different layer of complexity that really makes things difficult as the agent first has to learn to randomly explore a maze and then the exit which is a difficult task to differentiate. What could come in handy here is the sequencial training that ML-Agents offers. Here you could for example first teach to solve the maze and then add the layer of the treasures on top.

As a general advice to get familiar with ML-Agents itself i can only recommend to take some time and some more simple task and then try to solve that on your own. For example some of the example projects that are shipped with ML-Agents.

Sidenote: Why should you listen to me? I did my master thesis in Ml-Agents and these are the things i learned on the way.