In this experiment I am attempting to use the Sentis Tensor framework to train a neural network inside Unity, to demonstrate that the same framework can be used for both inference and training. While this might not be particularly useful for games (as far as I can think) it might be good for educational purposes or people could build a backpropagation framework that works in Unity as an alternative to Tensorflow or Pytorch. BTW just seen you can do math in the comments which is cool.

**Background**

A neural network is just a function x_{out} = f(\omega,x_{in}). Where \omega are stand for the weights.

To train a neural network you just add more layers to the neural network to calculate a better set of weights:

\omega_{new} = F(\omega_{old},x_{in},x_{expected})

As such I decided to see if I could implement a back propagation algorithm for a two layer neural network of the form x_{out} = \sigma( \omega_1 \sigma(\omega_2 x_{in})), where \sigma(x) is the sigmoid function.

My result is that I trained the network to simulate an XOR gate. Itâ€™s probably not the most efficient algorithm, taking about 20,000 steps to converge.

Here is the â€śproofâ€ť (the corners show the values of the XOR of the two input variables):

And here is the code:

```
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using Unity.Sentis;
using Lay = Unity.Sentis.Layers;
public class BackProp : MonoBehaviour
{
public Model model;
IWorker m_Engine;
public Material mat;
Tensor input, expected, learningRate;
Tensor W1;
Tensor W2;
Texture2D tex;
float learning_rate = 0.1f;
int A = 3;//input size
int B = 3;//hidden size
int C = 1;//output size
int L = 8;//texture size
void Start()
{
tex = new Texture2D(L, L, TextureFormat.RGB24, false);
tex.wrapMode = TextureWrapMode.Clamp;
model = new Model();
mat.mainTexture = tex;
learningRate = new TensorFloat(learning_rate);
// ------------------Set up the inference Model-----------------------------
input = new TensorFloat(new TensorShape(A, 1), new float[A]);
float[] W1vals = new float[B * A];
float[] W2vals = new float[C * B];
for (int i = 0; i < W1vals.Length; i++) W1vals[i] = UnityEngine.Random.Range(-1f, 1f);
for (int i = 0; i < W2vals.Length; i++) W2vals[i] = UnityEngine.Random.Range(-1f, 1f);
W1 = new TensorFloat(new TensorShape(B, A), W1vals);
W2 = new TensorFloat(new TensorShape(C, B), W2vals);
model.AddInput("input", DataType.Float, new SymbolicTensorShape(input.shape));
model.AddInput("W1", DataType.Float, new SymbolicTensorShape(W1.shape));
model.AddLayer(new Lay.MatMul("mul1", "W1", "input"));
model.AddLayer(new Lay.Sigmoid("sigmoid1", "mul1"));
model.AddInput("W2", DataType.Float, new SymbolicTensorShape(W2.shape));
model.AddLayer(new Lay.MatMul("mul2", "W2", "sigmoid1"));
model.AddLayer(new Lay.Sigmoid("output", "mul2"));
model.AddOutput("output");
//--------------------------Calculate error from expected value--------------------------------
model.AddInput("expected", DataType.Float, new SymbolicTensorShape(new[]{C,1}));
model.AddLayer(new Lay.Sub("sub", "expected", "output"));
model.AddLayer(new Lay.Square("square", "sub"));
model.AddLayer(new Lay.ReduceSum("error", new string[] { "square" }, true));
model.AddOutput("error");
//--------------------------Calculate new Weights using Gradient Descent----------------------
// This is calculated using Î´W = dE/dW
model.AddConstant(new Lay.Constant("learningRate", learningRate));
model.AddLayer(new XTimesOneMinusX("diff_sigmoid2", "output"));
model.AddLayer(new Lay.Mul("m2", "diff_sigmoid2", "sub"));
model.AddLayer(new Lay.Transpose("sigmoid1_T", "sigmoid1", new int[] { 1, 0 }));
model.AddLayer(new Lay.Mul("dW2", "m2", "sigmoid1_T"));
model.AddLayer(new Lay.Mul("a_dW2", "learningRate", "dW2"));
model.AddLayer(new Lay.Sum("newW2", new string[] { "W2", "a_dW2" }));
model.AddOutput("newW2");
model.AddLayer(new XTimesOneMinusX("diff_sigmoid1", "sigmoid1"));
model.AddLayer(new Lay.Transpose("W2_T", "W2", new int[] { 1, 0 }));
model.AddLayer(new Lay.MatMul("W2_m2", "W2_T", "m2"));
model.AddLayer(new Lay.Mul("m3", "diff_sigmoid1", "W2_m2"));
model.AddLayer(new Lay.Transpose("input_T", "input", new int[] { 1, 0 }));
model.AddLayer(new Lay.Mul("dW1", "m3", "input_T"));
model.AddLayer(new Lay.Mul("a_dW1", "learningRate", "dW1"));
model.AddLayer(new Lay.Sum("newW1", new string[] { "W1", "a_dW1" }));
model.AddOutput("newW1");
m_Engine = WorkerFactory.CreateWorker(BackendType.GPUCompute, model);
}
bool train = false;
void Update() {
if(Input.GetKeyDown(KeyCode.DownArrow))
{
learning_rate /= 2f;
learningRate.Dispose();
learningRate = new TensorFloat(learning_rate);
}
if (Input.GetKeyDown(KeyCode.UpArrow))
{
learning_rate *= 2f;
learningRate.Dispose();
learningRate = new TensorFloat(learning_rate);
}
if (Input.GetKeyDown(KeyCode.T))
{
train = !train;
}
if (train) for(int k=0;k<100;k++) TrainOneStep(k);
}
int N = 0;
void TrainOneStep(int k)
{
N++;
input?.Dispose();
expected?.Dispose();
float[] inputVals = new float[A];
int in0 = UnityEngine.Random.Range(0, 2);
int in1 = UnityEngine.Random.Range(0, 2);
inputVals[0] = in0;
inputVals[1] = in1;
inputVals[2] = 1f;
input = new TensorFloat(new TensorShape(A, 1), inputVals);
float[] expectedVals = new float[C];
//what we want the values to be:
//-----------XOR------------------
expectedVals[0] = (float)(in1 ^ in0);
expected = new TensorFloat(new TensorShape(C, 1), expectedVals);
var m_Inputs = new Dictionary<string, Tensor>
{
{"input", input },
{"W1", W1 },
{"W2", W2 },
{"expected", expected }
};
m_Engine.Execute(m_Inputs);
if (k == 0)
{
var outputTensor = m_Engine.PeekOutput("output") as TensorFloat;
float[] output = outputTensor.ToReadOnlyArray();
Debug.Log($"{N} Expected:{string.Join(',', expectedVals)}\n Outputs:{string.Join(',', output)}\n");
var errorTensor = m_Engine.PeekOutput("error") as TensorFloat;
float[] error = errorTensor.ToReadOnlyArray();
Debug.Log($"Error:{string.Join(',', error)}\n");
}
var newW2 = m_Engine.PeekOutput("newW2") as TensorFloat;
W2.Dispose();
W2 = newW2.DeepCopy();
var newW1 = m_Engine.PeekOutput("newW1") as TensorFloat;
W1.Dispose();
W1 = newW1.DeepCopy();
if(k==0)UpdateTexture();
}
void UpdateTexture()
{
byte[] data = new byte[L * L * 3];
for(int x = 0; x < L; x++)
{
for(int y = 0; y < L; y++)
{
int n = 3 * (y * L + x);
float v= InferOneStep(x * 1.0f / L, y * 1.0f / L);
data[n] = (byte)Mathf.Clamp(v * 255,0,255);
data[n+1] = (byte)Mathf.Clamp(v * 255, 0, 255);
data[n+2] = (byte)Mathf.Clamp((1-v) * 255, 0, 255);
}
}
tex.LoadRawTextureData(data);
tex.Apply();
}
float InferOneStep(float x,float y) {
input?.Dispose();
float[] inputVals = new float[A];
inputVals[0] = x;
inputVals[1] = y;
inputVals[2] = 1f;
input = new TensorFloat(new TensorShape(A, 1), inputVals);
var m_Inputs = new Dictionary<string, Tensor>
{
{"input", input },
{"W1", W1 },
{"W2", W2 }
};
m_Engine.Execute(m_Inputs);
var outputTensor = m_Engine.PeekOutput("output") as TensorFloat;
float[] output = outputTensor.ToReadOnlyArray();
// Debug.Log($"Inputs{string.Join(',', inputVals)}\n Outputs:{string.Join(',', output)}\n");
return output[0];
}
void CleanUp()
{
m_Engine.Dispose();
input?.Dispose();
expected?.Dispose();
W1?.Dispose();
W2?.Dispose();
XTimesOneMinusX.one?.Dispose();
}
private void OnDisable()
{
CleanUp();
}
}
[System.Serializable]
public class XTimesOneMinusX : Lay.Layer
{
public static TensorFloat one = new TensorFloat(1); //potential leak here;
public XTimesOneMinusX(string name, string input)
{
this.name = name;
inputs = new[] { input };
}
public override Tensor Execute(Tensor[] inputs, ExecutionContext ctx)
{
var x = inputs[0] as TensorFloat;
var o = ctx.ops.Mul(x, ctx.ops.Sub(one, x));
return o;
}
}
```

Feel free to criticise my code. I havenâ€™t tried to optimise it.

**Explanation**

The model has an input of size 3. The first two numbers are from the set \{0,1\} and the last one always 1.

It has a hidden layer of size 3.

It has an output layer of size 1.

There error is calclated by E=|x_{output}-x_{expected}|^2

The new weights are gotten by gradient descent using \delta \omega = \alpha \frac{\partial E}{\partial \omega} where \alpha is the learning rate. Libraries like Tensorflow and Pytorch would calculate all the terms of this expression and turn it into additional nodes on the graph. Here we did it by hand. But it would be possible for someone to implement this automatically.

**Conclusion**

This was quite a fun project and I learned a lot about back propagation. The Tensor framework works very well. One thing I noticed is that because it works using named nodes as strings, there is no compiler checking that you have named all your nodes correctly until runtime. I suppose it would be good practice to store your strings in constants.

It would not be impossible to implement an automatic back propagation algorithm without having to do it by hand.

When building this neural network I realised that it would have been easier with a visual coding tool like the Shader Tool. That might not be practical for massive networks but useful for smaller networks.

After I built this model, I would like to see the graph of it (in Netron for example) . How would I do this?

Another thing I thought was how would you save your trained neural network?

To change the values of inputs I am Disposing them and then creating a new one. Is this the correct thing to do? Or can I refill the tensor with new values?

Is there any advantage in building the model in layers or could I just make it one very complicated Layer with lots of â€śctx.opsâ€ť functions?

**THE END**

Here is a a video showing the error slowly decreasing as it converges:

Edit: fixed a spelling mistake for W2 shape size.