# ***HALF TRAINED NETWORK***

I would like to ask what is happening and my training network sometimes is trained well! I have the result that I want!
But sometimes it does not…

Where could be the problem ?

Probably in the rewards?

Thank you! ! ! ! ! !

It could be so many things You’ll have to provide more information on what you are doing before anyone can try and help you.

1 Like

I hope I don’t tire you. I write them as concisely as I can.

Ιn a few words…

I have some agents that they are trained to navigate from one initial point to their goal, avoiding obstacles and other agents.

I start with train only one agent that can go from one initial point to his goal, without obstacles or other agents.
Sometimes he trains well, but sometimes no.
The observations are the position of the agent and the position of the goal.

Lets say, initial position (-5,-5) and goal (5,5).
The rewards are:

• Calculate every time the distance to the goal. I check every time the current distance with the previous one.
• check also the vector direction = current_position - goal_position with the previous one, so as to see if he moves on the goal direction, if they are parallel or coincide means that we move towards to the goal.

first_distance => is the initial distance from the agent to the goal

if (current distance < previous distance && same direction)
{
approach the goal position
reward = 0.5
}
if(current distance < previous distance && different direction)
{
approach the goal but not forwards
reward = 0.001;

}
if (current distance > previous distance && different direction)
{
go away from goal
reward = -0.5
}
if(current distance > previous distance && same direction)
{

go away from goal
reward = -0.5

}
if( reaches the goal)
{
reward = 1;
Done()
}

if(distance > first_distance)
{
he goes very very far from the goal
reward = -1;
Done()
}

Do you think of any conditions that I have not taken into account?
Thank you again!!!

Hi,

I think you are over complicating the rewards. In general they should be as simple as possible. In your case I would just do:

Agent gets to the goal: +1

And that’s it. From this simple reward it should eventually learn that moving towards the goal will get it a reward.

You may wish to put a limit on the number of steps (or perhaps the amount of time) the agent should take to get to the goal. If the agent doesn’t make it within that number, you can set the reward to -1.

In general, avoid micro-managing the reward system. All those conditions you listed out, the agent will learn on its own, given enough time. Always start with the simplest possible reward (+1 when goal is reached).

You might also start on a small enough space so that agent has a good chance of making it to the goal within a reasonable amount of time. If that works, slowly expand the space and see how it performs.

Hope this helps,
Dan

1 Like

Hello again!!
Thank you so much!!!
I try to keep only one reward, as you said and only when he reaches the goal to take reward 1 and max steps =100.
It seems to work and I was very happy.
But again when I tried to train my agent for the second time, it does not reaches the goal never. It trapped in a specific path as before, the reward is 0 and he does not follow another path so as to take some reward.

Can you post a screenshot of the tensorboard output for the failed and successful runs?

ok!! Successfull runs

Fail efforts

The reward is 0.

Ηonestly Ι don’t know what else to do…

Errr… those failure graphs look like a bug of some sort. They are not supposed to be able to look like that You are showing 3 values for each step count. Maybe file a bug at: Issues · Unity-Technologies/ml-agents · GitHub

1 Like

oo ok!!! thank you very much for your time and for your help!!!
:):):):):):):):):):):):):):):)