Hello,
When using the OnActionReceived function in the learning process i notice that the values i get in actions.ContinuousActions are changing at good rate the begining (approx 30 sec), but after, they get stuck oncalues of +1 or -1.
The action vector is continuous with 3 possible actions.
What could be the problem ?
I attached pictures of the relevant code
Hi,
The fact that all the action outputs converged to +1 or -1 is showing that your training setup (observations, reward, etc) is reinforcing the agent to have +1 or -1 actions and it thinks its the best strategy to get high rewards.
Without more information about your specific environment and what your goal is, it’s hard to tell what’s going wrong here. It could be your environment, or your reward is not giving the agent the right signal of what’s desired behavior, or something else.
You can start by checking if your environment is set up correctly (like actions are applied correctly) and the reward signals are actually encourage the agent to learn the desired behavior.
Hey,
There was really no reason for the agent to do so.
When i changed the NN setting from num layer 2 to 3 it worked out and the agent learned well !