MoreOrLess game solved by ML Agent

Hello, I recently started in AI with Unity and I wanted to start simple with the game of more or less:

  • a random number is chosen by the game between a given interval
  • the AI must find this number in as few moves as possible
  • the AI proposes a number and the game tells him if it is the mystery number or if it is bigger or smaller
  • If the proposed number is the same then the AI has won and the game is over

To do this I have set 3 possible rewards:

  1. the AI wins the game and is rewarded with 1000 points

  2. the AI proposes a number outside the given interval and loses 10 points

  3. the AI proposes a number in the given interval and loses only 1 point

    public override void AgentAction(float[] vectorAction, string textAction)
        int input = (int) (vectorAction[0] * 100);
        _game.Userinput = input;
        else if (_game.IsOut())
            if (_game.IsMore())
                _min = input;
            else if(_game.IsLess())
               _max = input;

I am using the min and max value as collected data from the VectorObservation.

Knowing that the mystery number is a randomly generated number (between 0 and 100), the average number should tend towards the middle of the interval (50) and that’s what the neuron does (although after 500,000 iterations it doesn’t reach them, it tends towards that number) but the problem is that the neuron still needs 50 to 150 proposals before finding the exact number, which remains far too many.

With human logic we would try to split the interval in two to know if the mystery number is above or below, and start again. Example :

  • mystery number = 20
  • 1st proposal = 50 => it’s less
  • 2nd proposal = 25 => it’s less
  • 3rd proposal 12 => it’s more
  • 4th proposal 18 => it’s more
  • 5th proposal 22 => it’s less
  • 6th proposal 20 => that’s right

In all, no more than 10 proposals with optimized resonance are required. If you have some clues to bring me to reduce the number of proposals from the neurons:)

PS: I have already tried with several neurons to train faster but it generates too much entropy and the result is a failure.

Hey there,
well there are some issues that should be further checked by you:
The Observation vector should perhaps contain 3 values: the last 2 values chosen and a value that is either 1 or 0 to indicate if the last chosen value was larger or lower than the needed number.

Note that you have to set a observation vector after resetting that will not result in wrongly learned behaviour. So after reset set all observation values to [-1,-1,-1] before you request a new action.

let me know if it worked out. Also check the tuning documentation for correct hyperparameters.

Thanks a lot for your advice, unfortunately I couldn’t test this before Monday so I’ll keep you informed :slight_smile:

It also gave me an idea to solve a problem I observed, sometimes the neuron will never find the solution because it lacks curiosity I guess. Indeed it will propose only negative values (which are not in the interval) and will dive deep into it for the 50 000 iterations without ever proposing good values. So I think I should add a try limit per game to limit the damage.