ML agents: 3D ball agent script explanation.

Could someone explain this script used in ML agents 3d ball example?

I know how rewarding and punishing works, but I’m having trouble figuring out what’s going on in AgentStep(). Also what variables should be put in CollectState()?

This post can be very useful for other beginners.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class Ball3DAgent : Agent
{
    [Header("Specific to Ball3D")]
    public GameObject ball;

    public override List<float> CollectState()
    {
        List<float> state = new List<float>();
        state.Add(gameObject.transform.rotation.z);
        state.Add(gameObject.transform.rotation.x);
        state.Add((ball.transform.position.x - gameObject.transform.position.x));
        state.Add((ball.transform.position.y - gameObject.transform.position.y));
        state.Add((ball.transform.position.z - gameObject.transform.position.z));
        state.Add(ball.transform.GetComponent<Rigidbody>().velocity.x);
        state.Add(ball.transform.GetComponent<Rigidbody>().velocity.y);
        state.Add(ball.transform.GetComponent<Rigidbody>().velocity.z);
        return state;
    }

    // to be implemented by the developer
    public override void AgentStep(float[] act)
    {
        if (brain.brainParameters.actionSpaceType == StateType.continuous)
        {
            float action_z = act[0];
            if (action_z > 2f)
            {
                action_z = 2f;
            }
            if (action_z < -2f)
            {
                action_z = -2f;
            }6
            if ((gameObject.transform.rotation.z < 0.25f && action_z > 0f) ||
                (gameObject.transform.rotation.z > -0.25f && action_z < 0f))
            {
                gameObject.transform.Rotate(new Vector3(0, 0, 1), action_z);

            }
            float action_x = act[1];
            if (action_x > 2f)
            {
                action_x = 2f;
            }
            if (action_x < -2f)
            {
                action_x = -2f;
            }
            if ((gameObject.transform.rotation.x < 0.25f && action_x > 0f) ||
                (gameObject.transform.rotation.x > -0.25f && action_x < 0f))
            {
                gameObject.transform.Rotate(new Vector3(1, 0, 0), action_x);
            }
               

            if (done == false)
            {
                reward = 0.1f;
            }
        }
        else
        {
            int action = (int)act[0];
            if (action == 0 || action == 1)
            {
                action = (action * 2) - 1;
                float changeValue = action * 2f;
                if ((gameObject.transform.rotation.z < 0.25f && changeValue > 0f) ||
                    (gameObject.transform.rotation.z > -0.25f && changeValue < 0f))
                {
                    gameObject.transform.Rotate(new Vector3(0, 0, 1), changeValue);
                }
            }
            if (action == 2 || action == 3)
            {
                action = ((action - 2) * 2) - 1;
                float changeValue = action * 2f;
                if ((gameObject.transform.rotation.x < 0.25f && changeValue > 0f) ||
                    (gameObject.transform.rotation.x > -0.25f && changeValue < 0f))
                {
                    gameObject.transform.Rotate(new Vector3(1, 0, 0), changeValue);
                }
            }
            if (done == false)
            {
                reward = 0.1f;
            }
        }
        if ((ball.transform.position.y - gameObject.transform.position.y) < -2f ||
            Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f ||
            Mathf.Abs(ball.transform.position.z - gameObject.transform.position.z) > 3f)
        {
            done = true;
            reward = -1f;
        }

    }

    // to be implemented by the developer
    public override void AgentReset()
    {
        gameObject.transform.rotation = new Quaternion(0f, 0f, 0f, 0f);
        gameObject.transform.Rotate(new Vector3(1, 0, 0), Random.Range(-10f, 10f));
        gameObject.transform.Rotate(new Vector3(0, 0, 1), Random.Range(-10f, 10f));
        ball.GetComponent<Rigidbody>().velocity = new Vector3(0f, 0f, 0f);
        ball.transform.position = new Vector3(Random.Range(-1.5f, 1.5f), 4f, Random.Range(-1.5f, 1.5f)) + gameObject.transform.position;
    }
}
1 Like

I think that you are looking at some really old code. AgentStep isn’t a method that is overrideable in the 0.3+ versions, that I know of.

n/m I see that your post is from a long time ago…

still need help on this, where do I find a good tutorial for ML agents.

Surajsirohi,
Here is lots of into on ml-agents (V0.4 for now, that’s what I’m using) ml-agents/docs at main · Unity-Technologies/ml-agents · GitHub

And within that doc is a good example (the first one that I did): ml-agents/docs/Learning-Environment-Create-New.md at main · Unity-Technologies/ml-agents · GitHub

than

thanks, I will check it out.

1 Like

I tried the Roller Agent, why does the training stops at 50000 steps? How can I increase the number of steps? After training I’m getting a mean reward of 2, it’s too low.

You can change any part of the training in the “trainer_config.yaml” file (located in the Assets\ml_agents_master\python folder)
Look at some of the “brain names” in that file. Maybe even rename your “brain” GameObject so that it uses a different brain trainer in that file. It will use default or will use the brain name listed in the trainer yaml file if it matches your gameobject brain name.
You could copy one of the brain sections, paste it at the bottom of the file, and rename it to the name of your GameObject that has the brain component on it.

1 Like

thanks again mate.

1 Like

@surajsirohi1008 Note my edit above

"You could copy one of the brain sections, paste it at the bottom of the file, and rename it to the name of your GameObject that has the brain component on it. "

1 Like

Noted, thanks.

1 Like

Also, note that you can modify the trainer file between runs. If you run it and it look like you need another 100k steps or something, just edit the trainer yaml file to up the max_steps by that amount and run again. It will continue from the number of steps it did before up to the new max steps. Works well when viewing the summaries in the browser as well.

1 Like

Muchas gracias