Unable to reach perfection in extraordinarily simple environment

Hello! I have recently begun conducting tests on Unity ML-Agents

After conducting some simple tests and finding them very stubborn to learning in the environments I put them in, I tried to create the simplest environment I could possibly find. I call him yesbot, and he should always answer yes. [Code Below]. Yesbot receives a value, 1 or 0, and outputs a value, 1 or 0. if his output is the same as the input, he wins a +1 reward. If it is not, he loses -1 reward.

Despite how simple this environment is, I noted that after 100,000 steps (as recorded in Windows Terminal) the bot was a steady 60% accuracy rate. I ran the calculations from my Debug.Log(“right or wrong”) several times while running the experiment and found that after about 20,000 steps it was 60% accurate to the second decimal. (60.0num). It never progressed above 60%. While watching my visual interpretation in the scene view, I noted that the agents were 100% correct every third decision. (over 36 agents) And then on the other two decisions, it was ~50% right. This behavior became incredibly consistent, and explains the 60% average accuracy. Even when I tried to use a trained model and use inference, this problem persisted.

I am using a single Discrete branch, with 2 possibilities. My behavior type is set to default. I do have a YAML file I am using. I have only changed the behavior name and the max steps.

I am very new and understand that I may be overlooking simple concepts. If you dont have an answer, I would appreciate if you would direct me in the direction of documentation that might have one.

-Rylan Yancey, Amateur dev

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;
using Unity.MLAgents.Actuators;

public class YesBot : Agent
{
    int yes;

    [SerializeField] Material Win;
    [SerializeField] Material Lose;

    public override void OnEpisodeBegin()
    {
        yes = Random.Range(0, 2);
    }

    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(yes);
    }

    public override void OnActionReceived(ActionBuffers actions)
    {
        MeshRenderer a;
        a = GetComponent<MeshRenderer>();

        if (actions.DiscreteActions[0] == yes)
        {
            AddReward(+5f);
            Debug.Log("Right!");
            a.material = Win;
        }
        else
        {
            AddReward(-5f);
            Debug.Log("Wrong!");
            a.material = Lose;
        }

        EndEpisode();
    }
}

Random.Range() seems to be inclusive on min and max value so your input might actually be [0, 1, 2] instead of [0, 1]

I think the max value given by Random.Range() for integers is exclusive. So I am not sure if that is the problem

https://docs.unity3d.com/ScriptReference/Random.Range.html

i tested it before hand. Random.Range(0,2) always returns either 0 or 1. The 2 is exclusive, the 0 inclusive.

Just an idea for debugging: instead of random, set yes always to 0, run the training, then set yes always to 1 and run the training

I saw a video on YouTube on old version of mlagents where an episode length of 1 caused problems. Can you try making your episode at least 2 steps long?

Also, can you please show your config.yaml file?

what are you training results in python? (i.e. the mean reward and the deviation)
also did you edit the training file / behaviours at all?
i did a test using your script and it seems like it is training fine here
at 25k steps it has reached 0.999 mean with 0.033 std so i would say they are very nearly perfect however when using the trained brain it does indeed only get it right 60% of the time… very odd
edit: found the problem, set the decision period to 1 in the decision requester and they will work correctly :slight_smile:
(i am using ML agents version 0.27.0)