Hello!
I’m currently trying to figure out what the best way would be to implement a “snowball fight” agent.
The agents would be running around a field with obstacles that offer cover, grabbing randomly spawned snowballs, and throwing them at other agents.
I intend on using Self-play to help increase the chances that the bots will get better. I’m starting out with a 1v1 battle.
The rewards are based on the difference in HP in the end, where the winner’s reward is reduced by how long it took them. Some snowballs are bigger than others (increasing the chance of hitting the enemy but also being blocked by terrain), and they also deal varying amounts of damage dealt to HP.
I have separated throwing at a target into its own agent, which takes in information on the target and agent. No problems here.
Now for the waypoint/ movement controlling agent, I’m struggling.
I have done two experiments so far, each with varying levels of success.
My first experiment was to have the bot go after the closest snowball, and then go after the enemy.
I set this up by having observations for the “target position” as well as the current mode (if it is seeking a snowball or trying to hit the enemy). The output was an angle and magnitude, applied to the direction to the current target (so it can skew the path if it wants to, and go slower/stop as it sees fit).
This version trained fairly well, with some interesting bot actions like running after throwing, unarmed bots running at armed bots (because they only know how to throw, makes it hard), etc.
I assume that it trained easily since there was a good chance for the bot to move towards a snowball, then the enemy, since it already had a path, and didn’t need to do anything (other than spit out a nonzero magnitude) to receive a reward.
The problem was that after it chose a target snowball, it would be stubborn in staying that specific one, unless the opponent took it first. I tried to add in re-targeting, which would let the bot target the new closest one, but the bot would spam it (it used overlapSphere) causing lag.
Also, principle-wise, since it did not know about the snowballs in its surroundings, it would be unable to plan with secondary objectives, like moving to a strategic attack point that’s also close to many snowballs, or choose a path with contingency plans.
My second experiment was to show it the stats of the five closest snowballs, as well as the enemy.
This version would output X,Z (which were normalized) and magnitude, but would not be given an explicit target destination, and its controls are not set to go to the snowballs with high chance.
This version gave the bot a much harder time to learn, but it was eventually able to go towards a snowball and throw it. It was also not constrained by moving relative to a predetermined waypoint, allowing it to be much more dynamic and flexible.
Ultimately, my question is, which one should I pursue?
I can go with option one, providing the bot with the closest targets, and letting the bot output the index of the snowball. This version may train faster than than the second approach, but end up with less deep behavior.
Alternatively, I can take a gamble and see if the bot can learn to move without any guidance, with the hope that its freedom and lack of predetermined direction will let it explore different strategies.