Observation normalisation - relative or not?

In the documentation it says that agent observations should be normalised to a range between 0 and 1. I am creating a scenario in which the size of the map can expand and contract. One of my agents observations is the distance to some point on the map. Should I normalise this so that 1 is the largest that any map can be, or is it fine to have 1 as the largest value relative the the size of the current map?

Just to clarify, 1 would be the furthest point in the map, and the distance to some target would be somewhere between 1 and 0. The question is whether 1 should always be the furthest point in any map, or just for the particular map which the agent is inhabiting. Will the observation and inference be relevant to other maps despite the relative change in the max distance?

the agent will learn relationship between distance and other features (speed, time, etc. depending on your code).
Varying inputs can make it harder for the agent to understand those relationship.

Does that mean having 1 as the relative max is a bad idea, and that a global max (set at 1) should be used?

i’m no expert in the field, but here’s my reasoning:

imagine using position as gps, to give the agent a sense of its position in the space. on a 2d map, you make it so that the input range goes from -1 to 1, center being (0,0), upper left corner (-1, -1), lower right (1, 1). The agent will learn its relative position to the env, doesn’t matter the size of the map.
if i tell you, as a person, that your coordinates in a room are (0, 0.5), you will have no idea about the size of the room, but you will know that you are halfway from the north and south wall, 3/4 from the left wall and 1/4 from the right one.

But you are talking about a precise distance from an object/area, i suppose the agent will use this info to determine stuff like how much time it takes to get there, how many actions can it take while covering that distance, how much food does it need to get there alive (just random guessing your code).
What if i tell you to learn to do something in 1 minute, but 1 minute does not always correspond to 1 minute?

So yes, i think a global max can lead to a more stable comprehension of the environment.
Of course i might be wrong, as i said i’m no expert after all :slight_smile:

I would have thought, if the maps were randomised during training and the agent couldn’t ‘overfit’ to one particular map size, it would learn that the sizes were relative, based on it’s other observations about the world.

it could easily be as you say, it’s an interesting topic, a more professional answer is needed.

1 Like

You could also observe different metrics in order to give the agent additional information. If it needs to know its relative position on a map, as well as absolute distances to detectable objects or specific points on the map, why not observe both? Relative agent coordinates would scale with the map size, assuming upper left corner (-1, -1), lower right (1, 1). But distance_to_point/detection_radius would always be a fraction between 0 and 1 that’s independent of the map size.