[Not Solved] Training agents without movement

[NOT SOLVED]
Hello there,
I am making a project that consists in a matrix in which each cube is an agent. The cubes do not move, they just have to change their state (Material), so if they are dead, they are white; and if they are alive they can be 3 races, black, green or red. So the goal is to obtain a matrix with the three races alive and no dead.
To do this I give a reward of -10 if the agent is dead, and a bunch of more things if it is alive (the reward depends of this depends if the cubes around a cube are of the same colour, so if the cube has turn into red but has 8 greens around it it will get a punishment for example).
So, for the observations I give to the agent the color of the neighbours in numbers (dead=0, black=1, green=2 , red=3), the neighbours are the cubes above, on the left, on the right, etc so they are 8 observations plus the color of the agent, so 9.
Taking all this into account the agents learn to have all the same colour to maximise the reward. To change this I am going to give as a reward the entropy of all the cubes. By doing this the agent will look to maximise also the entropy, so to have as much races alive as possible. But the problem is, ¿what do I give as an observation for this? If I put the entropy thing, and give as observation the color (the number) all the cubes in the matrix. my reward does not increase, it stays linear.
Also, I have some questions about ml-agents:

  • ¿It is possible to do a project of this type, because all the project that I have seen involved movement?
  • I am using the fixed update so, all the cubes has the same entropy, because if I calculate the entropy in the agent action, the entropy changes, ¿is it a good idea?
  • I am trying also to add a reward that represents a race, ¿any ideas?
  • ¿If I give as observation the color (the number) all the cubes in the matrix, the vector observation size is 153, is this a problem?
  • ¿Some time ago when I gave the entropy as a reward, the reward graphic went up but then down like a mountain, why is this?

I know It is too much info, so I will be waiting for your response.

Here is my code if it helps.
Agent Code

using UnityEngine;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;
using System.Security.Principal;
using UnityScript.Steps;
using UnityEditor.Experimental.UIElements;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;

public class ConwayCube : Agent
{
    [Header("¿Que color es?")]
    public int tipo;
    [Header("Tiempo entre decisiones")]
    public float timeBetweenDecisionsAtInference;
    [Header("Materials")]
    public Material dead;
    public Material black;
    public Material green;
    public Material red;
    [HideInInspector]
    public Material actualmaterial;
    [HideInInspector]
    public bool contado;
    private float m_TimeSinceDecision;
    private new Rigidbody rigidbody;
    private Material cubeBack, cubeForward, cubeRight, cubeLeft, cubeForwardRight, cubeForwardLeft, cubeBackRight, cubeBackLeft;
    private Histograma histograma;
    private float entropiaC, entropiaN, entropiaV, entropiaR;
    private int colores0, coloresN, coloresV, coloresR;
    private int coloresNT, coloresVT, coloresRT;

    public override void Initialize()
    {
        histograma = gameObject.GetComponentInParent<Histograma>();
        entropiaC = getEntropyCompleta2(histograma.getHistogramaCompleto());
        rigidbody = GetComponent<Rigidbody>();
        ChooseRandomColor();
        base.Initialize();
        entropiaN = getEntropyVecinos(black);
        entropiaV = getEntropyVecinos(green);
        entropiaR = getEntropyVecinos(red);
        int[] colores = CuentaColores();
        colores0 = colores[0];
        coloresN = colores[1];
        coloresV = colores[2];
        coloresR = colores[3];
        /*if (tipo==1)
        {
            EstaNegro();
        }else if (tipo==2)
        {
            EstaVerde();
        }else if (tipo==3)
        {
            EstaRojo();
        }else
        {
            EstaMuerto();
        }*/
    }

    public void EstaMuerto()
    {
        gameObject.GetComponentInChildren<Renderer>().material = dead;
        actualmaterial = dead;
    }
    public void EstaRojo()
    {
        gameObject.GetComponentInChildren<Renderer>().material = red;
        actualmaterial = red;
    }
    public void EstaVerde()
    {
        gameObject.GetComponentInChildren<Renderer>().material = green;
        actualmaterial = green;
    }
    public void EstaNegro()
    {
        gameObject.GetComponentInChildren<Renderer>().material = black;
        actualmaterial = black;
    }


    public override void CollectObservations(VectorSensor sensor)
    {
        List<Material> lista = new List<Material>();
        lista.Add(actualmaterial);
        lista.Add(cubeForward);
        lista.Add(cubeForwardRight);
        lista.Add(cubeRight);
        lista.Add(cubeBackRight);
        lista.Add(cubeBack);
        lista.Add(cubeBackLeft);
        lista.Add(cubeLeft);
        lista.Add(cubeForwardLeft);
        List<float> listanum = new List<float>();
        foreach (Material material in lista)
        {
            if (material == dead)
            {
                //AddVectorObs(0);
                listanum.Add(0);
            }
            else if (material == black)
            {
                //AddVectorObs(1);
                listanum.Add(1);
            }
            else if (material == green)
            {
                //AddVectorObs(2);
                listanum.Add(2);
            }
            else if (material == red)
            {
                //AddVectorObs(3);
                listanum.Add(3);
            }
            else
            {
                //AddVectorObs(4);
                listanum.Add(4);
            }
        }
        foreach (int num in listanum)
        {
            sensor.AddObservation(num);
        }
        foreach (int hist in histograma.getListaCompleto())
        {
            sensor.AddObservation(hist);
        }
        //sensor.AddObservation(histograma.getHistogramaColorTrampas(actualmaterial));
        //sensor.AddObservation(entropiaN);
        //sensor.AddObservation(entropiaV);
        //sensor.AddObservation(entropiaR);
    }
    public float getEntropyCompleta2(int[] histogram)
    {
        double[] lista = new double[histogram.Length];
        double entropia = 0;
        int muertos = histogram[0];

        for (int i = 1; i < histogram.Length; i++)
        {
            lista[i] = histogram[i];
            lista[i] = lista[i] / (144 - muertos);
            if (lista[i] != 0)
            {
                entropia += (lista[i] * Math.Log(lista[i], 3));
            }
        }
        return -(float)entropia;
    }
    //Pasar a clase histograma
    public float getEntropyCompleta(int[] histogram)
    {
        double[] lista = new double[histogram.Length];
        double entropia = 0;
        int muertos = histogram[0];

        for (int i = 0; i < histogram.Length; i++)
        {
            lista[i] = histogram[i];
            lista[i] = lista[i] / (144);
            if (lista[i] != 0)
            {
                entropia += (lista[i] * Math.Log(lista[i], 4));
            }
        }
        return -(float)entropia;
    }
    public float getEntropyVecinos(Material color)
    {
        float[] histogram = histograma.getHistogramaColorTrampas(color);
        int vecinos = histograma.getNumeroVecinos(color);

        int num;

        if (color == dead)
        {
            num = 0;
        }
        else if (color == black)
        {
            num = 1;
        }
        else if (color == green)
        {
            num = 2;
        }
        else if (color == red)
        {
            num = 3;
        }
        else
        {
            num = 4;
        }
        float aux2 = histogram[num];
        float aux1 = vecinos;
        float entropia = (aux2) / (aux1);
        if (aux2 == 0)
        {
            entropia = -10;
        }
        return entropia;
    }
    public override void OnActionReceived(float[] vectorAction)
    {
        var color = (int)vectorAction[0];
        contado = true;
        if (coloresNT == 48 && coloresRT == 48 && coloresVT == 48)
        {
            AddReward(50f);
            EndEpisode();
        }
        else
        {
            if (actualmaterial == dead)
            {
                if (coloresV == 0 && coloresR == 0 && coloresN != 0)
                {
                    EstaNegro();
                    AddReward(0.00001f);
                }
                else if (coloresN == 0 && coloresR == 0 && coloresV != 0)
                {
                    EstaVerde();
                    AddReward(0.00001f);
                }
                else if (coloresV == 0 && coloresN == 0 && coloresR != 0)
                {
                    EstaRojo();
                    AddReward(0.00001f);
                }
                else if (coloresV == 0 && coloresR == 0 && coloresN == 0)
                {
                    EstaMuerto();
                    AddReward(-10f);
                }
                else
                {
                    switch (color)
                    {
                        case 0:
                            EstaMuerto();
                            break;
                        case 1:
                            EstaNegro();
                            break;
                        case 2:
                            EstaVerde();
                            break;
                        case 3:
                            EstaRojo();
                            break;
                    }
                    if (actualmaterial == dead)
                    {
                        AddReward(-10f);
                    }
                    else if (actualmaterial == black)
                    {
                        if (coloresN > coloresR && coloresN > coloresV && coloresN > colores0)
                        {
                            AddReward(0.001f);
                        }
                        else if (coloresN > coloresR && coloresN > coloresV && coloresN < colores0)
                        {
                            AddReward(0.0001f);
                        }
                        else if (coloresN > coloresR && coloresN < coloresV)
                        {
                            if (coloresR == 0)
                            {
                                AddReward(-1f);
                            }
                            else
                            {
                                AddReward(-0.1f);
                            }
                        }
                        else if (coloresN < coloresR && coloresN > coloresV)
                        {
                            if (coloresV == 0)
                            {
                                AddReward(-1f);
                            }
                            else
                            {
                                AddReward(-0.1f);
                            }
                        }
                        else if (coloresN < coloresR && coloresN < coloresV)
                        {
                            AddReward(-1f);
                        }

                    }
                    else if (actualmaterial == green)
                    {
                        if (coloresV > coloresR && coloresV > coloresN && coloresV > colores0)
                        {
                            AddReward(0.001f);
                        }
                        else if (coloresV > coloresR && coloresV > coloresN && coloresV < colores0)
                        {
                            AddReward(0.0001f);
                        }
                        else if (coloresV > coloresR && coloresV < coloresN)
                        {
                            if (coloresR == 0)
                            {
                                AddReward(-1f);
                            }
                            else
                            {
                                AddReward(-0.1f);
                            }
                        }
                        else if (coloresV < coloresR && coloresV > coloresN)
                        {
                            if (coloresN == 0)
                            {
                                AddReward(-1f);
                            }
                            else
                            {
                                AddReward(-0.1f);
                            }
                        }
                        else if (coloresV < coloresR && coloresV < coloresN)
                        {
                            AddReward(-1f);
                        }
                    }
                    else if (actualmaterial == red)
                    {
                        if (coloresR > coloresN && coloresR > coloresV && coloresR > colores0)
                        {
                            AddReward(0.001f);
                        }
                        else if (coloresR > coloresN && coloresR > coloresV && coloresR < colores0)
                        {
                            AddReward(0.0001f);
                        }
                        else if (coloresR > coloresN && coloresR < coloresV)
                        {
                            if (coloresN == 0)
                            {
                                AddReward(-1f);
                            }
                            else
                            {
                                AddReward(-0.1f);
                            }
                        }
                        else if (coloresR < coloresN && coloresR > coloresV)
                        {
                            if (coloresV == 0)
                            {
                                AddReward(-1f);
                            }
                            else
                            {
                                AddReward(-0.1f);
                            }
                        }
                        else if (coloresR < coloresN && coloresR < coloresV)
                        {
                            AddReward(-1f);
                        }
                    }
                }
            }
            else
            {
                switch (color)
                {
                    case 0:
                        EstaMuerto();
                        break;
                    case 1:
                        EstaNegro();
                        break;
                    case 2:
                        EstaVerde();
                        break;
                    case 3:
                        EstaRojo();
                        break;
                }
                if (actualmaterial == dead)
                {
                    AddReward(-10f);
                }
                else if (actualmaterial == black)
                {
                    if (coloresN > coloresR && coloresN > coloresV && coloresN > colores0)
                    {
                        AddReward(0.001f);
                    }
                    else if (coloresN > coloresR && coloresN > coloresV && coloresN < colores0)
                    {
                        AddReward(0.0001f);
                    }
                    else if (coloresN > coloresR && coloresN < coloresV)
                    {
                        if (coloresR == 0)
                        {
                            AddReward(-1f);
                        }
                        else
                        {
                            AddReward(-0.1f);
                        }
                    }
                    else if (coloresN < coloresR && coloresN > coloresV)
                    {
                        if (coloresV == 0)
                        {
                            AddReward(-1f);
                        }
                        else
                        {
                            AddReward(-0.1f);
                        }
                    }
                    else if (coloresN < coloresR && coloresN < coloresV)
                    {
                        AddReward(-1f);
                    }

                }
                else if (actualmaterial == green)
                {
                    if (coloresV > coloresR && coloresV > coloresN && coloresV > colores0)
                    {
                        AddReward(0.001f);
                    }
                    else if (coloresV > coloresR && coloresV > coloresN && coloresV < colores0)
                    {
                        AddReward(0.0001f);
                    }
                    else if (coloresV > coloresR && coloresV < coloresN)
                    {
                        if (coloresR == 0)
                        {
                            AddReward(-1f);
                        }
                        else
                        {
                            AddReward(-0.1f);
                        }
                    }
                    else if (coloresV < coloresR && coloresV > coloresN)
                    {
                        if (coloresN == 0)
                        {
                            AddReward(-1f);
                        }
                        else
                        {
                            AddReward(-0.1f);
                        }
                    }
                    else if (coloresV < coloresR && coloresV < coloresN)
                    {
                        AddReward(-1f);
                    }
                }
                else if (actualmaterial == red)
                {
                    if (coloresR > coloresN && coloresR > coloresV && coloresR > colores0)
                    {
                        AddReward(0.001f);
                    }
                    else if (coloresR > coloresN && coloresR > coloresV && coloresR < colores0)
                    {
                        AddReward(0.0001f);
                    }
                    else if (coloresR > coloresN && coloresR < coloresV)
                    {
                        if (coloresN == 0)
                        {
                            AddReward(-1f);
                        }
                        else
                        {
                            AddReward(-0.1f);
                        }
                    }
                    else if (coloresR < coloresN && coloresR > coloresV)
                    {
                        if (coloresV == 0)
                        {
                            AddReward(-1f);
                        }
                        else
                        {
                            AddReward(-0.1f);
                        }
                    }
                    else if (coloresR < coloresN && coloresR < coloresV)
                    {
                        AddReward(-1f);
                    }
                }
            }
        }
        //Que si no hay niguno del color, no hay de donde restar
        //¿Sumarlos?
        /*AddReward((entropiaN) / 10);
        AddReward(entropiaV / 10);
        AddReward(entropiaR / 10);*/
        AddReward(entropiaC / 2);
    }

    public int[] CuentaColores()
    {
        int coloresN = 0, coloresR = 0, coloresV = 0, colores0 = 0;
        Ray cubeRayForward = new Ray(transform.position, Vector3.forward);
        Ray cubeRayBack = new Ray(transform.position, Vector3.back);
        Ray cubeRayRight = new Ray(transform.position, Vector3.right);
        Ray cubeRayLeft = new Ray(transform.position, Vector3.left);
        Ray cubeRayForwardRight = new Ray(transform.position, new Vector3(1, 0, 1));
        Ray cubeRayForwardLeft = new Ray(transform.position, new Vector3(-1, 0, 1));
        Ray cubeRayBackRight = new Ray(transform.position, new Vector3(1, 0, -1));
        Ray cubeRayBackLeft = new Ray(transform.position, new Vector3(-1, 0, -1));
        RaycastHit hit;
        if (Physics.Raycast(cubeRayForward, out hit, 1))
        {
            if (hit.collider.tag == "cube")
            {
                cubeForward = hit.collider.gameObject.GetComponentInChildren<ConwayCube>().actualmaterial;
                if (cubeForward == dead)
                {
                    colores0++;
                }
                else if (cubeForward == black)
                {
                    coloresN++;
                }
                else if (cubeForward == red)
                {
                    coloresR++;
                }
                else if (cubeForward == green)
                {
                    coloresV++;
                }
            }
        }
        if (Physics.Raycast(cubeRayBack, out hit, 1))
        {
            if (hit.collider.tag == "cube")
            {
                cubeBack = hit.collider.gameObject.GetComponentInChildren<ConwayCube>().actualmaterial;
                if (cubeBack == dead)
                {
                    colores0++;
                }
                else if (cubeBack == black)
                {
                    coloresN++;
                }
                else if (cubeBack == red)
                {
                    coloresR++;
                }
                else if (cubeBack == green)
                {
                    coloresV++;
                }
            }
        }
        if (Physics.Raycast(cubeRayRight, out hit, 1))
        {
            if (hit.collider.tag == "cube")
            {
                cubeRight = hit.collider.gameObject.GetComponentInChildren<ConwayCube>().actualmaterial;
                if (cubeRight == dead)
                {
                    colores0++;
                }
                else if (cubeRight == black)
                {
                    coloresN++;
                }
                else if (cubeRight == red)
                {
                    coloresR++;
                }
                else if (cubeRight == green)
                {
                    coloresV++;
                }
            }
        }
        if (Physics.Raycast(cubeRayLeft, out hit, 1))
        {
            if (hit.collider.tag == "cube")
            {
                cubeLeft = hit.collider.gameObject.GetComponentInChildren<ConwayCube>().actualmaterial;
                if (cubeLeft == dead)
                {
                    colores0++;
                }
                else if (cubeLeft == black)
                {
                    coloresN++;
                }
                else if (cubeLeft == red)
                {
                    coloresR++;
                }
                else if (cubeLeft == green)
                {
                    coloresV++;
                }
            }
        }
        if (Physics.Raycast(cubeRayForwardRight, out hit, 2))
        {

            if (hit.collider.tag == "cube")
            {
                cubeForwardRight = hit.collider.gameObject.GetComponentInChildren<ConwayCube>().actualmaterial;
                if (cubeForwardRight == dead)
                {
                    colores0++;
                }
                else if (cubeForwardRight == black)
                {
                    coloresN++;
                }
                else if (cubeForwardRight == red)
                {
                    coloresR++;
                }
                else if (cubeForwardRight == green)
                {
                    coloresV++;
                }
            }
        }
        if (Physics.Raycast(cubeRayForwardLeft, out hit, 2))
        {
            if (hit.collider.tag == "cube")
            {
                cubeForwardLeft = hit.collider.gameObject.GetComponentInChildren<ConwayCube>().actualmaterial;
                if (cubeForwardLeft == dead)
                {
                    colores0++;
                }
                else if (cubeForwardLeft == black)
                {
                    coloresN++;
                }
                else if (cubeForwardLeft == red)
                {
                    coloresR++;
                }
                else if (cubeForwardLeft == green)
                {
                    coloresV++;
                }
            }
        }
        if (Physics.Raycast(cubeRayBackRight, out hit, 2))
        {
            if (hit.collider.tag == "cube")
            {
                cubeBackRight = hit.collider.gameObject.GetComponentInChildren<ConwayCube>().actualmaterial;
                if (cubeBackRight == dead)
                {
                    colores0++;
                }
                else if (cubeBackRight == black)
                {
                    coloresN++;
                }
                else if (cubeBackRight == red)
                {
                    coloresR++;
                }
                else if (cubeBackRight == green)
                {
                    coloresV++;
                }
            }
        }
        if (Physics.Raycast(cubeRayBackLeft, out hit, 2))
        {
            if (hit.collider.tag == "cube")
            {
                cubeBackLeft = hit.collider.gameObject.GetComponentInChildren<ConwayCube>().actualmaterial;
                if (cubeBackLeft == dead)
                {
                    colores0++;
                }
                else if (cubeBackLeft == black)
                {
                    coloresN++;
                }
                else if (cubeBackLeft == red)
                {
                    coloresR++;
                }
                else if (cubeBackLeft == green)
                {
                    coloresV++;
                }
            }
        }
        int[] colores = new int[4];
        colores[0] = colores0;
        colores[1] = coloresN;
        colores[2] = coloresV;
        colores[3] = coloresR;
        return colores;
    }

    public void ChooseRandomColor()
    {
        if (UnityEngine.Random.Range(0, 24) == 3)
        {
            switch (UnityEngine.Random.Range(0, 3))
            {
                case 0:
                    EstaNegro();
                    break;
                case 1:
                    EstaVerde();
                    break;
                case 2:
                    EstaRojo();
                    break;
            }
        }
        else
        {
            EstaMuerto();
        }

    }

    public override void OnEpisodeBegin()
    {
        ChooseRandomColor();
        /*if (tipo==1)
        {
            EstaNegro();
        }
        else if (tipo==2)
        {
            EstaVerde();
        }
        else if (tipo==3)
        {
            EstaRojo();
        }
        else
        {
            EstaMuerto();
        }*/
    }

    public void FixedUpdate()
    {
        WaitTimeInference();
        histograma = gameObject.GetComponentInParent<Histograma>();
        int[] completo = histograma.getHistogramaCompleto();
        int[] colores = CuentaColores();
        colores0 = colores[0];
        coloresN = colores[1];
        coloresV = colores[2];
        coloresR = colores[3];
        coloresNT = completo[1];
        coloresVT = completo[2];
        coloresRT = completo[3];
        entropiaC = getEntropyCompleta2(completo);
        entropiaN = getEntropyVecinos(black);
        entropiaV = getEntropyVecinos(green);
        entropiaR = getEntropyVecinos(red);
        //Debug.Log("EntropiaC: " + entropiaC + "EntropiaN: " + entropiaN+ "EntropiaV: " + entropiaV+"EntropiaR: " + entropiaR);
    }

    void WaitTimeInference()
    {
        if (!Academy.Instance.IsCommunicatorOn)
        {
            RequestDecision();
        }
        else
        {
            if (m_TimeSinceDecision >= timeBetweenDecisionsAtInference)
            {
                m_TimeSinceDecision = 0f;
                RequestDecision();
            }
            else
            {
                m_TimeSinceDecision += Time.fixedDeltaTime;
            }
        }
    }

}

Area Code

using System.Collections;
using System.Collections.Generic;
using MLAgentsExamples;
using UnityEngine;
using System.Linq;
using TMPro;

public class Histograma : MonoBehaviour
{

    public Material dead;
    public Material black;
    public Material green;
    public Material red;
    [Tooltip("The TextMeshPro text that shows the cumulative reward of the agent")]
    public TextMeshPro cumulativeRewardText;

    public void Update()
    {
        float valor = 0;
        ConwayCube[] lista = gameObject.GetComponentsInChildren<ConwayCube>();
        foreach (ConwayCube cubo in lista)
        {
            valor += cubo.GetCumulativeReward();
        }
        valor /= 144;
        cumulativeRewardText.text = valor.ToString("0.00");
    }

    public List<int> getListaCompleto()
    {
        List<int> histograma = new List<int>();
        ConwayCube[] lista = gameObject.GetComponentsInChildren<ConwayCube>();
        foreach (ConwayCube cubo in lista)
        {
            if (cubo.actualmaterial == dead)
            {
                histograma.Add(0);
            }
            else if (cubo.actualmaterial == black)
            {
                histograma.Add(1);
            }
            else if (cubo.actualmaterial == green)
            {
                histograma.Add(2);
            }
            else if (cubo.actualmaterial == red)
            {
                histograma.Add(3);
            }
            else
            {
                histograma.Add(4);
            }
        }
        return histograma;
    }
    public int[] getHistogramaCompleto()
    {
        int[] histograma = getListaCompleto().ToArray();
        int colores0 = 0, coloresR = 0, coloresV = 0, coloresN = 0;
        foreach (int cubo in histograma)
        {
            switch (cubo)
            {
                case 0: colores0++; break;
                case 1: coloresN++; break;
                case 2: coloresV++; break;
                case 3: coloresR++; break;
                case 4: break;
            }
        }
        int[] mapa = new int[4];
        mapa[0] = colores0;
        mapa[1] = coloresN;
        mapa[2] = coloresV;
        mapa[3] = coloresR;
        return mapa;
    }
    public int[] getHistogramaColor(Material color)
    {
        int colores0 = 0, coloresR = 0, coloresV = 0, coloresN = 0;
        int[] histograma = new int[4];
        ConwayCube[] lista = gameObject.GetComponentsInChildren<ConwayCube>();
        foreach (ConwayCube cubo in lista)
        {
            if (cubo.actualmaterial == color)
            {
                int[] colores = cubo.CuentaColores();
                colores0 += colores[0];
                coloresN += colores[1];
                coloresV += colores[2];
                coloresR += colores[3];
            }
        }
        histograma[0] = colores0;
        histograma[1] = coloresN;
        histograma[2] = coloresV;
        histograma[3] = coloresR;
        return histograma;
    }

    public ConwayCube[] getLista()
    {
        ConwayCube[] lista = gameObject.GetComponentsInChildren<ConwayCube>();
        return lista;
    }
    public int getNumeroVecinos(Material color)
    {
        int[] histograma = getListaCompleto().ToArray();
        int max = histograma.Length, num, valor = 0;
        if (color == dead)
        {
            num = 0;
        }
        else if (color == black)
        {
            num = 1;
        }
        else if (color == green)
        {
            num = 2;
        }
        else if (color == red)
        {
            num = 3;
        }
        else
        {
            num = 4;
        }
        for (int i = 0; i < max; i++)
        {
            if (histograma[i] == num)
            {
                valor++;
            }
        }
        return valor;

    }
    public float[] getHistogramaColorTrampas(Material color)
    {
        int colores0 = 0, coloresR = 0, coloresV = 0, coloresN = 0, num;
        int[] histograma = getListaCompleto().ToArray();
        int max = histograma.Length;
        int[] contado = new int[max];
        for (int i = 0; i < max; i++)
        {
            contado[i] = 0;
        }
        if (color == dead)
        {
            num = 0;
        }
        else if (color == black)
        {
            num = 1;
        }
        else if (color == green)
        {
            num = 2;
        }
        else if (color == red)
        {
            num = 3;
        }
        else
        {
            num = 4;
        }
        for (int i = 0; i < max; i++)
        {
            if (histograma[i] == num)
            {
                if (i + 1 < max)
                {
                    if (contado[i + 1] == 0)
                    {
                        contado[i + 1] = 1;
                        switch (histograma[i + 1])
                        {
                            case 0: colores0++; break;
                            case 1: coloresN++; break;
                            case 2: coloresV++; break;
                            case 3: coloresR++; break;
                            case 4: break;
                        }
                    }
                }
                if (i + 11 < max)
                {
                    if (i != 0 || i % 12 != 0)
                    {
                        if (contado[i + 11] == 0)
                        {
                            contado[i + 11] = 1;
                            switch (histograma[i + 11])
                            {
                                case 0: colores0++; break;
                                case 1: coloresN++; break;
                                case 2: coloresV++; break;
                                case 3: coloresR++; break;
                                case 4: break;
                            }
                        }
                    }
                }
                if (i + 12 < max)
                {
                    if (contado[i + 12] == 0)
                    {
                        contado[i + 12] = 1;
                        switch (histograma[i + 12])
                        {
                            case 0: colores0++; break;
                            case 1: coloresN++; break;
                            case 2: coloresV++; break;
                            case 3: coloresR++; break;
                            case 4: break;
                        }
                    }
                }
                if (i + 13 < max)
                {
                    if (i != 11 || (i + 1) % 12 != 0)
                    {
                        if (contado[i + 13] == 0)
                        {
                            contado[i + 13] = 1;
                            switch (histograma[i + 13])
                            {
                                case 0: colores0++; break;
                                case 1: coloresN++; break;
                                case 2: coloresV++; break;
                                case 3: coloresR++; break;
                                case 4: break;
                            }
                        }
                    }
                }
                if (i - 1 >= 0)
                {
                    if (contado[i - 1] == 0)
                    {
                        contado[i - 1] = 1;
                        switch (histograma[i - 1])
                        {
                            case 0: colores0++; break;
                            case 1: coloresN++; break;
                            case 2: coloresV++; break;
                            case 3: coloresR++; break;
                            case 4: break;
                        }
                    }
                }
                if (i - 11 >= 0)
                {
                    if (i != 11 || ((i + 1) % 12 != 0))
                    {
                        if (contado[i - 11] == 0)
                        {
                            contado[i - 11] = 1;
                            switch (histograma[i - 11])
                            {
                                case 0: colores0++; break;
                                case 1: coloresN++; break;
                                case 2: coloresV++; break;
                                case 3: coloresR++; break;
                                case 4: break;
                            }
                        }
                    }
                }
                if (i - 12 >= 0)
                {
                    if (contado[i - 12] == 0)
                    {
                        contado[i - 12] = 1;
                        switch (histograma[i - 12])
                        {
                            case 0: colores0++; break;
                            case 1: coloresN++; break;
                            case 2: coloresV++; break;
                            case 3: coloresR++; break;
                            case 4: break;
                        }
                    }
                }
                if (i - 13 >= 0)
                {
                    if (i != 0 || (i % 12 != 0))
                    {

                        if (contado[i - 13] == 0)
                        {
                            contado[i - 13] = 1;
                            switch (histograma[i - 13])
                            {
                                case 0: colores0++; break;
                                case 1: coloresN++; break;
                                case 2: coloresV++; break;
                                case 3: coloresR++; break;
                                case 4: break;
                            }
                        }
                    }
                }
            }
        }
        float[] mapa = new float[4];
        mapa[0] = colores0;
        mapa[1] = coloresN;
        mapa[2] = coloresV;
        mapa[3] = coloresR;
        return mapa;
    }

}

Certainly reenforcement learning is not limited to movement. I am not sure I understand completely what you are trying to achieve. What behavior do you want each agent to learn?

The reward system you have is very complex. I think you should be able to start with +1 if alive, and -1 if dead. Or perhaps even just -1 if dead and 0 if alive. The placement of neighbors and their colors (and the optimal color to stay alive) is something the agents should learn on their own. You shouldn’t have to spell that out.

Also, are you using Discrete Space Type?

var color = (int)vectorAction[0];

If you are using Continuous Space Type (which is the default), the input into OnActionReceived() is in the range of -1 to 1. Which is probably not what you want.

I want that the agent know his environment, and the information about the cubes with the same race. I want to achieve a replica of how will each race respond to the environment (if they are going to maintain as a group or maybe go bigger), for reaching this I will have in a future to play with the rewards. But for now I just want to know if my agents can learn from the environment. (using an agent to each cube instead of making the whole graphic an agent).

I have already done some runs with simple values and the graphs looks good, the agent learns to stay in a color (meaning not dead). The graph

5941589--636041--Screenshot 2020-06-05 at 00.50.59.png

The vector action is discrete. I think that for this project it is more simple to use the discrete Space Type, right?

I am also doing all the reward things to try to get together as much of the same race as possible. I will like to have three typer os rewards, the one of the agent, that makes it to not be dead and to gather cubes with the same race; the one of the environment that looks to have as much distribution of races as possible (I mean NO only one race alive; and the one of the race, that has its personal politics (this one is not implemented yet, and I am in doubt about it).

Also, I am doing the training now adding the entropy histogram as a reward, and there is something that there is something that bothers me. In the training there is a point where it finds a maximum reward but then as the training continues the reward decreases a and then stays at that value.
The maximum value is when it makes the entropy almost 1, so there are all the races alive and with almost the same amount. But then even though it has less reward it learns to be all of the same colour and it stucks with that. This kind of graph (not exactly this one): Reward with the Entropy

5941589--636038--Screenshot 2020-06-05 at 01.08.07.png

I hope someone can help me.

P.S.: In my last post I did say: “If I put the entropy thing, and give as observation the color (the number) all the cubes in the matrix. my reward does not increase, it stays linear.” I just solve it It was just a that dumb thing.

5941589--636041--Screenshot 2020-06-05 at 00.50.59.png
5941589--636038--Screenshot 2020-06-05 at 01.08.07.png

In all honesty it sounds like what you are implementing, to my understanding, doesn’t require reenforcement learning. Are you trying to mimic something like the Conway game of life (based on the names of your variables)?

For learning, it helps to have a specific goal that your agents need to achieve. It’s not clear (to me at least) what the goal that each of your agents are trying to achieve. From your description you seem to want to implement automata.

Can you clearly state what your agent should learn how to do? Some example of that are:

“Learn how to throw a ball into a basket”
“Learn how to keep away from dangerous areas”
“Learn how to catch another agent”

You say you have many agents, but you are wanting some sort of an emergent behavior based on simple rules that sound very much like automata.

The first idea was to make like a Conway game of life, but with multiple race and each race will have a different goal (one prefer to invade, one prefer to survive as much as possible). But right now, has nothing to do.

Learn how to survive in the environment. Thats it. But I do not want just a +1 if alive, -1 if dead; I want to see interactions between races. The main goal to achieve is to see that by setting some parameters to each race, how each race interacts with the environment.
So right now, the state will go more to a “Learn how to survive in the environment, by grouping all together (but not letting only one race alive”. (that is what I am implementing with the rewards I think)
I do not see this exercise as automata, just because the way it will have to make decisions is not even similar. There is only one rule, the other rules are not implemented, the agent has to learn it itself.