Experiment: Neural net training using Sentis Tensors

In this experiment I am attempting to use the Sentis Tensor framework to train a neural network inside Unity, to demonstrate that the same framework can be used for both inference and training. While this might not be particularly useful for games (as far as I can think) it might be good for educational purposes or people could build a backpropagation framework that works in Unity as an alternative to Tensorflow or Pytorch. BTW just seen you can do math in the comments which is cool. :+1:

Background
A neural network is just a function x_{out} = f(\omega,x_{in}). Where \omega are stand for the weights.

To train a neural network you just add more layers to the neural network to calculate a better set of weights:

\omega_{new} = F(\omega_{old},x_{in},x_{expected})

As such I decided to see if I could implement a back propagation algorithm for a two layer neural network of the form x_{out} = \sigma( \omega_1 \sigma(\omega_2 x_{in})), where \sigma(x) is the sigmoid function.

My result is that I trained the network to simulate an XOR gate. It’s probably not the most efficient algorithm, taking about 20,000 steps to converge.

Here is the “proof” (the corners show the values of the XOR of the two input variables):

And here is the code:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.Sentis;
using Lay = Unity.Sentis.Layers;



public class BackProp : MonoBehaviour
{
    public Model model;
    IWorker m_Engine;
    public Material mat;

    Tensor input, expected, learningRate;
    Tensor W1;
    Tensor W2;

    Texture2D tex;

    float learning_rate = 0.1f;
    int A = 3;//input size
    int B = 3;//hidden size
    int C = 1;//output size
    int L = 8;//texture size
    void Start()
    {
        tex = new Texture2D(L, L, TextureFormat.RGB24, false);
        tex.wrapMode = TextureWrapMode.Clamp;
        model = new Model();
        mat.mainTexture = tex;

        learningRate = new TensorFloat(learning_rate);

        // ------------------Set up the inference Model-----------------------------
        input = new TensorFloat(new TensorShape(A, 1), new float[A]);
        float[] W1vals = new float[B * A];
        float[] W2vals = new float[C * B];
        for (int i = 0; i < W1vals.Length; i++) W1vals[i] = UnityEngine.Random.Range(-1f, 1f);
        for (int i = 0; i < W2vals.Length; i++) W2vals[i] = UnityEngine.Random.Range(-1f, 1f);
        W1 = new TensorFloat(new TensorShape(B, A), W1vals);
        W2 = new TensorFloat(new TensorShape(C, B), W2vals);
        model.AddInput("input", DataType.Float, new SymbolicTensorShape(input.shape));
        model.AddInput("W1", DataType.Float, new SymbolicTensorShape(W1.shape));
        model.AddLayer(new Lay.MatMul("mul1", "W1", "input"));
        model.AddLayer(new Lay.Sigmoid("sigmoid1", "mul1"));
        model.AddInput("W2", DataType.Float, new SymbolicTensorShape(W2.shape));
        model.AddLayer(new Lay.MatMul("mul2", "W2", "sigmoid1"));
        model.AddLayer(new Lay.Sigmoid("output", "mul2"));
        model.AddOutput("output");

        //--------------------------Calculate error from expected value--------------------------------
        model.AddInput("expected", DataType.Float, new SymbolicTensorShape(new[]{C,1}));
        model.AddLayer(new Lay.Sub("sub", "expected", "output"));
        model.AddLayer(new Lay.Square("square", "sub"));
        model.AddLayer(new Lay.ReduceSum("error", new string[] { "square" }, true));
        model.AddOutput("error");

        //--------------------------Calculate new Weights using Gradient Descent----------------------
        // This is calculated using δW = dE/dW

        model.AddConstant(new Lay.Constant("learningRate", learningRate));

        model.AddLayer(new XTimesOneMinusX("diff_sigmoid2", "output"));
        model.AddLayer(new Lay.Mul("m2", "diff_sigmoid2", "sub"));
        model.AddLayer(new Lay.Transpose("sigmoid1_T", "sigmoid1", new int[] { 1, 0 }));
        model.AddLayer(new Lay.Mul("dW2", "m2", "sigmoid1_T")); 
        model.AddLayer(new Lay.Mul("a_dW2", "learningRate", "dW2"));
        model.AddLayer(new Lay.Sum("newW2", new string[] { "W2", "a_dW2" }));
        model.AddOutput("newW2");

        model.AddLayer(new XTimesOneMinusX("diff_sigmoid1", "sigmoid1"));
        model.AddLayer(new Lay.Transpose("W2_T", "W2", new int[] { 1, 0 }));
        model.AddLayer(new Lay.MatMul("W2_m2", "W2_T", "m2"));
        model.AddLayer(new Lay.Mul("m3", "diff_sigmoid1", "W2_m2"));
        model.AddLayer(new Lay.Transpose("input_T", "input", new int[] { 1, 0 }));
        model.AddLayer(new Lay.Mul("dW1", "m3", "input_T"));
        model.AddLayer(new Lay.Mul("a_dW1", "learningRate", "dW1"));
        model.AddLayer(new Lay.Sum("newW1", new string[] { "W1", "a_dW1" }));
        model.AddOutput("newW1");

        m_Engine = WorkerFactory.CreateWorker(BackendType.GPUCompute, model);

    }

    bool train = false;
    void Update() {

        if(Input.GetKeyDown(KeyCode.DownArrow))
        {
            learning_rate /= 2f;
            learningRate.Dispose();
            learningRate = new TensorFloat(learning_rate);
        }
        if (Input.GetKeyDown(KeyCode.UpArrow))
        {
            learning_rate *= 2f;
            learningRate.Dispose();
            learningRate = new TensorFloat(learning_rate);
        }
        if (Input.GetKeyDown(KeyCode.T))
        {
            train = !train;
        }
        if (train) for(int k=0;k<100;k++) TrainOneStep(k);
    }
    int N = 0;
    void TrainOneStep(int k)
    {
        N++;
        input?.Dispose();
        expected?.Dispose();
        float[] inputVals = new float[A];
        int in0 = UnityEngine.Random.Range(0, 2);
        int in1 = UnityEngine.Random.Range(0, 2);
        inputVals[0] = in0;
        inputVals[1] = in1;
        inputVals[2] = 1f;
        input = new TensorFloat(new TensorShape(A, 1), inputVals);
        float[] expectedVals = new float[C];
        //what we want the values to be:
        //-----------XOR------------------
        expectedVals[0] = (float)(in1 ^ in0); 

        expected = new TensorFloat(new TensorShape(C, 1), expectedVals);

        var m_Inputs = new Dictionary<string, Tensor>
        {
            {"input", input },
            {"W1", W1 },
            {"W2", W2 },
            {"expected", expected }
        };

        
        m_Engine.Execute(m_Inputs);
        if (k == 0)
        {
            var outputTensor = m_Engine.PeekOutput("output") as TensorFloat;
            float[] output = outputTensor.ToReadOnlyArray();
            Debug.Log($"{N} Expected:{string.Join(',', expectedVals)}\n Outputs:{string.Join(',', output)}\n");

            var errorTensor = m_Engine.PeekOutput("error") as TensorFloat;
            float[] error = errorTensor.ToReadOnlyArray();
            Debug.Log($"Error:{string.Join(',', error)}\n");
        }

        var newW2 = m_Engine.PeekOutput("newW2") as TensorFloat;
        W2.Dispose();
        W2 = newW2.DeepCopy();

        var newW1 = m_Engine.PeekOutput("newW1") as TensorFloat;
        W1.Dispose();
        W1 = newW1.DeepCopy();
        if(k==0)UpdateTexture();
    }



    void UpdateTexture()
    {
        byte[] data = new byte[L * L * 3];
        for(int x = 0; x < L; x++)
        {
            for(int y = 0; y < L; y++)
            {
                int n = 3 * (y * L + x);
                float v= InferOneStep(x * 1.0f / L, y * 1.0f / L);
                data[n] = (byte)Mathf.Clamp(v * 255,0,255);
                data[n+1] = (byte)Mathf.Clamp(v * 255, 0, 255);
                data[n+2] = (byte)Mathf.Clamp((1-v) * 255, 0, 255);
            }
        }
        tex.LoadRawTextureData(data);
        tex.Apply();
    }

    float InferOneStep(float x,float y) { 
        input?.Dispose();

        float[] inputVals = new float[A];
        inputVals[0] = x;
        inputVals[1] = y;
        inputVals[2] = 1f;

        input = new TensorFloat(new TensorShape(A, 1), inputVals);
    
        var m_Inputs = new Dictionary<string, Tensor>
        {
            {"input", input },
            {"W1", W1 },
            {"W2", W2 }
        };

        m_Engine.Execute(m_Inputs);
        var outputTensor = m_Engine.PeekOutput("output") as TensorFloat;
        float[] output = outputTensor.ToReadOnlyArray();
       // Debug.Log($"Inputs{string.Join(',', inputVals)}\n Outputs:{string.Join(',', output)}\n");
        return output[0];
    }

    void CleanUp()
    {
        m_Engine.Dispose();
        input?.Dispose();
        expected?.Dispose();
        W1?.Dispose();
        W2?.Dispose();
        XTimesOneMinusX.one?.Dispose();
    }

    private void OnDisable()
    {
        CleanUp();
    }
}



[System.Serializable]
public class XTimesOneMinusX : Lay.Layer
{
    public static TensorFloat one = new TensorFloat(1); //potential leak here;
    public XTimesOneMinusX(string name, string input)
    {
        this.name = name;
        inputs = new[] { input };
    }
    public override Tensor Execute(Tensor[] inputs, ExecutionContext ctx)
    {
        var x = inputs[0] as TensorFloat;
        var o = ctx.ops.Mul(x, ctx.ops.Sub(one, x));
        return o;
    }
}

Feel free to criticise my code. I haven’t tried to optimise it.

Explanation
The model has an input of size 3. The first two numbers are from the set \{0,1\} and the last one always 1.
It has a hidden layer of size 3.
It has an output layer of size 1.

There error is calclated by E=|x_{output}-x_{expected}|^2

The new weights are gotten by gradient descent using \delta \omega = \alpha \frac{\partial E}{\partial \omega} where \alpha is the learning rate. Libraries like Tensorflow and Pytorch would calculate all the terms of this expression and turn it into additional nodes on the graph. Here we did it by hand. But it would be possible for someone to implement this automatically.

Conclusion
This was quite a fun project and I learned a lot about back propagation. The Tensor framework works very well. One thing I noticed is that because it works using named nodes as strings, there is no compiler checking that you have named all your nodes correctly until runtime. I suppose it would be good practice to store your strings in constants.

It would not be impossible to implement an automatic back propagation algorithm without having to do it by hand.

When building this neural network I realised that it would have been easier with a visual coding tool like the Shader Tool. That might not be practical for massive networks but useful for smaller networks.

After I built this model, I would like to see the graph of it (in Netron for example) . How would I do this?

Another thing I thought was how would you save your trained neural network?

To change the values of inputs I am Disposing them and then creating a new one. Is this the correct thing to do? Or can I refill the tensor with new values?

Is there any advantage in building the model in layers or could I just make it one very complicated Layer with lots of “ctx.ops” functions?

THE END
Here is a a video showing the error slowly decreasing as it converges:

Edit: fixed a spelling mistake for W2 shape size.

4 Likes

That’s super cool!
Genuinely impressed :slight_smile:

  • graph visualizer: we’d need to use GraphTools Foundation Package
  • saving models: cf other thread, via ModelWriter/ModelLoader
  • you can modify tensors directly with the accessors. tensor[0] = 0.0f; for example
  • building the model via the graph is more or less analogous to using the ops. We could make the graph building a lot more practical with operators and all that. The big advantage of that is now we can optimize that graph down/validate it. We have a ModelOptimizer/ModelValidator classes that allows you to optimize your graph down (but they are internal right now)

Just out of interest… Do you have any plans to implement training (backwards pass) with Sentis? (Maybe not for this release but in a year or two’s time?) Or do you think this might be left for people to make as an Asset Store product or even an opensource github project?

I think this would add a lot of value to have training and inference all implemented in the same language. This could almost be a competitor for tensorflow/pytorch for people beginning in AI. For example I could see this be popular in schools as a way to introduce people to neural networks in a visual medium. It seems useful to be able to import an ONNX file, and then re-train it with new data.

I tried to implement a simple automatic differentiation algorithm for a few basic nodes (MatMul, Sum, Mul) , which seems tow work OK. Although, for some steps (such as transpose) I need to know the rank of the tensors. (Not sure yet how to do that before runtime - probably a similar way how the shape of the output tensors are calculated with a pre-pass).

The basic idea was to have a function K(A,B,x) which calclulates A^n \partial_x B^n using the recurrent formula K(A,B(C),x) = K(A.\frac{\partial B}{\partial C}, C,x), adding new nodes at each step. There’s probably a better way of doing it. One could check the results by doing numerical differentiation with \Delta x\approx 0.00001 for example.

I can’t see any theoretical barrier, although tensorflow/pytorch are probably faster at this stage, but it would be a lot of work to implement all the many operators and all the special cases!

In terms of games, it might be useful for a game that learns from the user, perhaps it learns your voice or your playstyle or something. IDK. :slightly_smiling_face:

1 Like

Yeah I think the main thing is finding a use case for it to justify dev time on it really.
Maybe a github with no official support would be a good start yes.
The approach you gave is the correct one, we build the dual graph with automatic differentiation.
Why do you need the rank for the Transpose? the gradient of that is identity no?

1 Like

True, but the backwards pass for doing MatMul involves a transpose operation on the last two axis, so this would depend on the rank size. The MatMul2D supports transpose but only if tensors are of the same shape, whereas MatMul, one of the tensors can have an additional batch number. Anyway that’s just technical issues that could just be solved by calculating the shapes of the tensors beforehand.

You are right, this is probably a niche area, I just think it would be “nice” which is probably not enough reason to spend months implementing it! :slightly_smiling_face:

Thought this might be useful for people.

I found a python script which creates a training model ONNX file.

https://onnxruntime.ai/docs/api/python/on_device_training/training_artifacts.html

So if you have an ONNX and want to retrain it inside Unity then this would be useful: [Edit2:It is quite useful but there are a lot of custom operators to implement to make it work.]

import onnx
from onnxruntime.training import artifacts

# Load the forward only onnx model
model = onnx.load("model.onnx")

# Generate the training artifacts
artifacts.generate_artifacts(model,
                             requires_grad = ["parameters", "needing", "gradients"],
                             frozen_params = ["parameters", "not", "needing", "gradients"], 
                             loss = artifacts.LossType.CrossEntropyLoss,
                             optimizer = artifacts.OptimType.AdamW,
                             artifact_directory = path_to_output_artifact_directory)

It creates a training ONNX which you can use to train your model on the device. I haven’t tested it yet so it might use some unsupported operators. Also it only supports the Adam optimizer which is not so bad as this is the best one.

So while this doesn’t allow you to train models you wrote in c#, it does allow you to train any ONNX models you got, provided you give it the right parameters.

Not sure the point of all this. Why you’d want to do training within a game? I don’t know! :laughing:

Edit: I tried it out, unfortunately usually results in unsupported operators, e.g. MaxPoolGrad, ConvGrad, ReluGrad, InPlaceAcumulatorV2 and “Conv: Only constant tensors are supported for W”

Mind you, some of those could be implemented by the user. (Although that’s probably just as hard as implementing the whole backprop algorithm!) An interesting idea anyway :grinning:

Edit 2: I implemented some of the custom operators. ( for “InPlaceAcumulatorV2” I just pass through the second input which gives \delta W values and “TanhGrad” was not too hard if I got it right and it actually works really well. )

1 Like

This is an updated video, where I did a few optimisations:

  • Replaced activations with tanh instead of a sigmoid (this speeds it up a lot!)
  • Used batches especially for the output (instead of running the network once for every pixel!) (I should really use TensorToTexture but that’s not really a big bottleneck).
  • Changed the expectation from an XOR to a circle pattern

Basically it’s something like this: A Neural Network Playground

Not very useful but kind of neat (as they say in America).

4 Likes

That looks awesome!
I think training a network inside Unity has many potentials. In our case, we have an isometric game, and because our sorting code is not perfect, we need to manually “adjust” the sorting order in layer for some of the objects. As this process is very boring and repetitive, that could be a perfect job for the AI :laughing:.

1 Like

Another little video in this series (made a while ago but didn’t get round to uploading it) so I’m doing it now as it’s raining again :cloud_with_lightning_and_rain::cloud_with_lightning_and_rain: :pensive::

Again creating the training ONNX using this script.

Needed to override ConvGrad and MaxpoolGrad layers. MaxpoolGrad needed to second output of Maxpool (see here) so had to do my best to construct it with other operators.

Starting from random model weights, it trains the model to categorise the MNIST characters in the test set. Every few seconds it tries to categorise the numbers onscreen into their respective columns.

(Best to skip ahead in the video, its not very interesting!)

Uses the Adam algorithm for training. Gets up to about 90% if you leave it for a few minutes.

It only uses batches of size 1. I don’t know how to make the script work for bigger batch sizes.

I wouldn’t say this example is particularly useful. It could be achieved just as well training in python and connecting to Unity using network code to display the graphics. (As is done with ML-agents).

But maybe in the future this might be useful if users want to for example, train a network in a game to recognise their particular voice or something like that. IDK, it remains to be seen…

BTW one thing that tripped me up for ages is I had one of my NewOutputTensorFloat as a NewTempTensorFloat. Easy mistake to make.

3 Likes

Thanks for sharing this!

1 Like