Seeking Advice for a Snowball Fight agent setup

Hello!

I’m currently trying to figure out what the best way would be to implement a “snowball fight” agent.
The agents would be running around a field with obstacles that offer cover, grabbing randomly spawned snowballs, and throwing them at other agents.

I intend on using Self-play to help increase the chances that the bots will get better. I’m starting out with a 1v1 battle.

The rewards are based on the difference in HP in the end, where the winner’s reward is reduced by how long it took them. Some snowballs are bigger than others (increasing the chance of hitting the enemy but also being blocked by terrain), and they also deal varying amounts of damage dealt to HP.

I have separated throwing at a target into its own agent, which takes in information on the target and agent. No problems here.

Now for the waypoint/ movement controlling agent, I’m struggling.

I have done two experiments so far, each with varying levels of success.

My first experiment was to have the bot go after the closest snowball, and then go after the enemy.

I set this up by having observations for the “target position” as well as the current mode (if it is seeking a snowball or trying to hit the enemy). The output was an angle and magnitude, applied to the direction to the current target (so it can skew the path if it wants to, and go slower/stop as it sees fit).

This version trained fairly well, with some interesting bot actions like running after throwing, unarmed bots running at armed bots (because they only know how to throw, makes it hard), etc.

I assume that it trained easily since there was a good chance for the bot to move towards a snowball, then the enemy, since it already had a path, and didn’t need to do anything (other than spit out a nonzero magnitude) to receive a reward.

The problem was that after it chose a target snowball, it would be stubborn in staying that specific one, unless the opponent took it first. I tried to add in re-targeting, which would let the bot target the new closest one, but the bot would spam it (it used overlapSphere) causing lag.

Also, principle-wise, since it did not know about the snowballs in its surroundings, it would be unable to plan with secondary objectives, like moving to a strategic attack point that’s also close to many snowballs, or choose a path with contingency plans.

My second experiment was to show it the stats of the five closest snowballs, as well as the enemy.

This version would output X,Z (which were normalized) and magnitude, but would not be given an explicit target destination, and its controls are not set to go to the snowballs with high chance.

This version gave the bot a much harder time to learn, but it was eventually able to go towards a snowball and throw it. It was also not constrained by moving relative to a predetermined waypoint, allowing it to be much more dynamic and flexible.

Ultimately, my question is, which one should I pursue?

I can go with option one, providing the bot with the closest targets, and letting the bot output the index of the snowball. This version may train faster than than the second approach, but end up with less deep behavior.

Alternatively, I can take a gamble and see if the bot can learn to move without any guidance, with the hope that its freedom and lack of predetermined direction will let it explore different strategies.

There’s no standard answer for which approach is better. It’s all about the use case and what behavior you want.
As you stated they resulted in pretty different behaviors, one might be more ideal than the other in different cases.

Your two approaches is actually training the agent with different tasks: the first one is given a fixed goal, go reach the goal; and the second is first find a goal and then try to reach it. The first one is obviously simpler so it’s easier to train but the second one could result in more interesting behaviors.

If what you want to know is whether approach 2 would work, I’d say generally it’s not too hard to train an agent that’s able to do this “search and collect” task. There’s an example scene FoodCollector in ML-Agents repo that demonstrate this and can be trained pretty reliably within a reasonable time. The agent is trained to collect as many “good food” as possible and avoid the bad ones. There’re multiple agents in the same area and the agent can shoot others to stop them from collecting food, so it’s also trained to do shooting at the same time (not exactly the same as snow fight but still a good reference).