Emotion recognition with FER+ ONNX Model

Hi all,

I am experimenting with getting the FER+ Emotion Recognition model working with Sentis. I’ve got Unity hooked up with the Webcam and the model loaded, but unfortunately I keep getting a “neutral” result regardless of facial expression. The output numbers do vary slightly, but not a lot. This is the code I am using:

Texture2D texture2D = new Texture2D(webcamTexture.width, webcamTexture.height);
TextureTransform test = new TextureTransform().SetDimensions(64,64,1);
TensorFloat inputTensor = TextureConverter.ToTensor(texture2D, test );
TensorFloat outputTensor = worker.PeekOutput() as TensorFloat;
float[] values = outputTensor.ToReadOnlyArray();

I’m quite new to AI so I’m probably missing something! Any pointers would be greatly appreciated.

1 Like

Seems like you’re sending it colour values in the range 0…1 where from the documentation looks like it’s expecting values in the range 0…255. (well to be fair the documentations unclear what it expects!)

Seems like the TextureTransform() could do with specifying an output range for your tensor, e.g. [-1…1], [0…1], [0…255]. Would be a useful feature to have (unless I missed it).

1 Like

Yes looking at the model it looks like it’s expecting input between 0-255 to then re-normalize between -1 and 1
In unity textures are between 0-1 so makes sense your model doesn’t produce the right results.
You have two solutions to this:

  • You can use a Op cf ExecuteOperatorOnTensor sample
op.Mul(tensorInput, new TensorFloat(255));
  • You can edit the imported Model
model.constants.Add(new Constant(newconstant_name, new TensorFloat(255));
model.layers.InsertAt(0, new Layers.Mul("newlayer_name", model_input, newconstant_name));
model.layers[1].inputs[0] = "newlayer_name"

Thanks for these replies! Adding the line to multiply the tensor by 255 seemed to help a bit. It is still erring towards neutral as a result though.

Do you think I need to do anymore processing of the image? Will the tensortransform automatically make it monochrome or should I do that manually?


Which line(s) did you add?

I found the second option works fine after Model model = ModelLoader.Load(…);

model.constants.Add(new Unity.Sentis.Layers.Constant("scale_factor", new TensorFloat(255)));
model.layers.Insert(0, new Unity.Sentis.Layers.Mul("scaled_input", "Input3", "scale_factor"));
model.layers[1].inputs[0] = "scaled_input";

(Or replace “Input3” with whichever is the name of the input in your model).

I tried it (not with a webcam) but with just images of different exaggerated faces as textures. Got “happy” and “angry” to come up top.
So if it’s still not working even with images instead of the webcam, you are right, it might be due to a noisy or dark webcam image. (Or even upside down!) Also, it’s an old model so not state of the art.

Hope you get it working. Here’s the images I used to test it:

:hushed: Not subtle lol.

1 Like

Hi yoonitee,

Thanks for your help! I tried it out with the two test images you attached and it works perfectly! So this must mean that there too much going on in the webcam photo. I’m thinking my next step will be to use a model that can detect a face in a picture then use that to trim the webcam image and then feed it in to the emotion recognition model.


1 Like


I also ran into this issue while using Google net age and gender classification and Inception v2
Scaling the input to 255 got me one step further. Yet many of the results I get seem still wrong. In the Google net documentation it is mentioned, that the models expect data in BGR format. Could this be an issue?

in that case try TextureTransform.SetChannelSwizzle(ChannelSwizzle.BGRA)

Thanks. That somewhat improved my results. But I will reevaluate my test set, as inception still gives me mostly wrong outputs.

Additionally to the BGR issue, I’ve come across this super resolution model and played around with it. It accepts images in RGB format, but it seems to convert them internally into YCbCr format, which is what the output will be in. Are conversions of this (and other) formats possible? If not, will they be possible in the future?

I’d suggest you to write your own compute shader to do the conversion then, it’s probably the best

Hey @liamgh , I used that FER dataset from Kaggle, 5 years ago. That dataset is very unbalanced and unclean. There are many images that have a wrong label (bad for training and classifiaction), other images have large black parts making them useless, and other artifacts.
Thus, if your code is working well, it is simply the unclean dataset the model was trained on that lead to wrong classification during inference.


@Christin2015 might you know of other open-source pre-trained onnx models available trained on cleaner data, and perhaps more state of the art? If anyone else happens to know, please let me know!

I’m also exploring FER/emotion recog models and am looking for something that can reliably output scores, at least for happy/sad.

Edit: According to this article, the Kaggle FER+ model is Microsoft’s extension of the original Kaggle competition model - a reviewed and cleaned up version.