I am training ML-Agents on a mobile game i made.
The game is physics based for the most part, and simple single touch input.
I have given it all the input data it might need, i have given it plenty of small example scenarios, etc.
Gave it a simple single reward function, that slightly increases if it has a winning streak (think runner game, the longer it goes the more reward every “part” you pass consecutively)
Properly randomized the scenarios as well.
I have even tweaked with hyper parameters in various way to see if i got any better results, but all the scenarios end the same.
Mastery goes to like 80% good gameplay, and then ends with a flatline period of no improvement, and still many rookie looking mistakes.
I think that i narrowed it down to “imbalanced data” as in, the player is spending a lot of time going straight, or flying in the air, where input doesn’t matter, and has smaller action moments where input matters life or death, and some period before those action moments where anticipation needs to happen.
I think the many “no action” parts is making the act of running the scenario’s not an optimal training ground. And “simulating” smaller scenarios is not really an option either.
So now the questions, is there any way i could inform the academy that the “current trained frame was not as interesting” as the frames just before and on an action moment? Any method whatsoever to have some control over bias in a long flow of data?
“So now the questions, is there any way i could inform the academy that the “current trained frame was not as interesting” as the frames just before and on an action moment?”
yes over Rewards. if it did something right, give it points , if not take points away (indirectly through giving points for each frame it is living or directly with negative rewards). Especially for simpler games only the current frame is already enough to make a reasonable decision (ppo takes into account the last x frames anyway). So if you have e.g. a ray-sensor only for enemies it will definitely be enough to inform the agent that this is important and trigger right movement.
Maybe your observations are wrong. I noticed that giving it quaternion instead of eulers makes the learning slower starting but much better, also instead of setting the rotation as Quaternion instead of euler as output works out better in most cases.
Also make the obeservations relative to the agent 's space if possible (the enemies position in local space with transform.InverseTransformPoint(enemy.transform)).
Best you show parts of your code and what kind of game it is to make bug search easier for us.
@ wonder-chimp, I’m tempted to wonder whether increasing the ‘decision period’ in the ‘decision requester’ might help? If you dont’ know, the decision requester gets decisions from your AI, and passes that onto the agent. The decision requester can pass the same decision on for several frames, depending on the ‘decision period’ set. This could reduce the number of ‘boring’ decisions taht the AI has to make. But it might not work well if there’s like 1 or 2 specific frames where teh AI must react. Alterantively, if the latter is the case, perhaps you could try reducing the decision period?
I also tend to be an advocate of adding some entropy regularization in, to encourage exploration. Full disclosure, my own video:
(this is made for stable baselinse3 PPO, but I imagine mlagents PPO also has entropy regularization option available too, via the config file?)
1 Like