How to make cooperative multi-agent environment ?

I would like to have a cooperative multi-agent environment where all agents will starve and there is not enough food. For not dying, they need to share the food so each in a while another one should take the food. How can I give the reward so agents think about others ?

1 Like

Just an idea, haven’t tried this myself: Agents have a health value, like 1 = fully fed, 0 = starved/dead. The value decreases over time and is reset to 1 when an agent picks up food. Each agent observes the health values of all agents. All agents are penalized equally whenever one agent dies.

I have the similar issue for such a cooperative multi-agent environment… So far my idea is, add reward -1 for one agent, if there is any other agent dies. So it requires the input of food information of all agents. And add some small positive reward when they make through for a certain time. Such that they can spread their food to the agents with less food in order to win more total rewards. I’m not sure whether it works.

Thank you for the ideas. I have already implemented a starvingLevel observation but they dont see each others. I am not sure that adding a list of starving level observations will help them without seeing who has that starvinglevel but It can be tried

Thanks for the response, I have similar environment to what you are describing except for giving small rewards for passing time.

credit assignment for such rewards can be really difficult to understand. and without linking the “starving” list to the actual objects, it will be nearly impossible for the agents to understand whats going on.

i have a similar issue with my simulation, agents use 3d raycast sensors, the can understand what are they looking at and the distance of the object, but they are unable to get the “state” of the observed obj.

i don’t know how are you implementing the starving level, but i’d try with tags. they are detectable by the raycast sensor, and easy to implement.

if(health <= x){
agent.tag = “starvingAgent”;
}else{
agent.tag = “healtyAgent”;

of course if you want to detect more than one state at the time, tags are useless.

I was thinking like you, if we have the opportunity to get more information from raycasts that would be awesome. I have one tag for both and I can change them to starvingAgent so I will update here if it goes well

i’m trying to understand how to get information out of the raycast sensor. if it can detect tags, it should be able to return other features too. But i have no idea how to do that, i just started with unity and my knowledge is really limited.

If i find something, i’ll let you know :slight_smile:

1 Like

With the latest release 4 of ml-agents, you should be able to retrieve your custom data via GetComponent() from the gameobject that was hit by a ray. Please see https://github.com/Unity-Technologies/ml-agents/pull/4111

wow, this is confusing.
Ok, so we can pull out infos from the hit objects, but then how can we add it to the observation space?
if we manually add them, we will be varying the action space size every time (hit 4 obj, add 4 health level, hit 3, add 3, and so on…).

i’m reading the raycast sensor script
ml-agents/com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensor.cs

and based on its structures, it already knows how many inputs it will add to the NN. (obviously :slight_smile: )
each ray has an index. each ray returns 3 values (if i got it right): first one is a one-hot encoded for the hit.tag, second is a boolean for hit or miss (don’t know why but it’s 0 for hit and 1 for miss), 3rd is the normalized distance from the target (1 = nothing in sight).
So, a ray with 3 tags, that misses everything will return something like that: ([0, 0, 0], 1, 1f).
Hit first tag at 75% max distance = ([0, 0, 1], 0, 0.75f).

If we want to add observations (ie. health value of the hits), without messing with the observation size, the only way around i can think of, is to:
1: make an array with same size of the raycast rays number, initialize to all zeros. and add it to AddObservation():
2: each time and observation is called, cycle through the raycastSensor output
3: manually get health values from the hits, normalize and add them to the health array at the same ray index.

that would be like adding a 4th value to the rays output[ ]. Probably if you’re confident in your skills, you might modify the RaySensor script to do that automatically, but i’d rather not touch it.

@mbaske actually i can’t find any reference to the gameobject we hit.
from the raySensor script:

var rayOutput = new RayPerceptionOutput.RayOutput
{
HasHit = castHit,
HitFraction = hitFraction,
HitTaggedObject = false,
HitTagIndex = -1
};

those are the values returned by the rays, (bool, distance, bool, tagIndex).

how can i access the gameobject? i can’t find any reference.
Can you help us out?

@m4l4 You’re looking at an old version of the code. The most recent release (release_4, version 1.2.0-preview) saves the GameObject in the RayOutput: https://github.com/Unity-Technologies/ml-agents/blob/release_4/com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensor.cs#L167

            var rayOutput = new RayPerceptionOutput.RayOutput
            {
                HasHit = castHit,
                HitFraction = hitFraction,
                HitTaggedObject = false,
                HitTagIndex = -1,
                HitGameObject = hitObject
            };

good way to waste half a day :smile::smile::smile: but at least i’ve learned a lot about the sensors :smile:

@celion_unity thank you very much, you spared me a pretty good headache.

There’s still a problem i cannot solve.

to access rayOutput, if we are using a decisionRequester, we don’t want to manually call Perceive() or any other function that uses it, because we don’t want to manually add observations to the training.
The raycast for the raySensor happens during the ISensor.Write(), called automatically every time a decision is requested, but called from different locations during inference or training.

since we want to use the rayOutput data to add new observations, we have to get these when the decision requester kicks in, without manually request them.

I’m jumping back and forth reading the documentation for ISensor, DecisionRequester, RayCast and RayPerceptionSensor3d but i’m too noob to figure it out. (i’m not even sure that my explanation makes sense).

any ideas?