Automatic Speech recognition using Unity Sentis

kiranmgcv · February 7, 2024, 10:33am

I want to create an Automatic Speech Recognition implementation in unity. I want to use the unity sentis for creating this implementation. The main objective of this implementation is that the speech recognition must work in offline and should use any buttons to trigger the recognition AI and the target platform is android. Can anyone help me with this implementation?

PaulBUnity · February 8, 2024, 7:13am

Have you tried our Whisper demo to get you started?

kiranmgcv · February 8, 2024, 9:13am

Hi Paul,
Thanks for your assistance. I downloaded the model that you have mentioned earlier and followed the instructions present in the model card. But Once I click on the play button, I am getting a message that “All compiler errors needs to be fixed before entering play mode”. But I don’t even have any compiler errors in the first place. I have checked the console panel thoroughly. Kindly help me in this regard.

kiranmgcv · February 8, 2024, 9:24am

Hi Paul,
I some how figured out the issue and now the audio input is getting transcribed to text without any issues. But what I need to carryout is to transcribe the user’s audio in real-time without clicking any buttons and compare the transcribed word with a set of words. The set of words here refer to the commands. So when the transcribed text matches any of the commands in the list, it should carry out some actions like switching scenes etc. Kindly help me with this implementation.

PaulBUnity · February 9, 2024, 4:45am

Funny you should mention this. But Thomas from Hugging Face has just put up a tutorial for almost exactly this thing.

kiranmgcv · February 9, 2024, 5:20am

Hi Paul,
Thanks for suggesting this one. It came very handy and I was able to create a speech recognition tool with the provided instructions. But I want this employ the speech recognition without clicking any buttons. I am also trying to use the transcribed text as commands to navigate through my game. Can you provide your suggestions for this implementation?

PaulBUnity · February 9, 2024, 5:27am

Did you see this link to this demo? I think it does what you want? It is a robot that responds to voice commands.

To respond to continual audio without buttons you would want to have a look at streaming audio from a microphone. Then you’ll generally want to detect when the volume of the mic goes above a certain threshold to get the start of the recording.

kiranmgcv · February 9, 2024, 6:25am

Hi Paul,
I watched the demo that Thomas has made completely. It seems he employs button clicks to achieve the speech recognition. I’ll try the method you just suggested and will let you know the results.

kiranmgcv · February 9, 2024, 9:48am

Hi Paul,
Tried using the method you suggested. But it doesn’t turnout well. It is not recognizing the audio. Can you provide any other alternate methods or instructions to achieve this implementation?

PaulBUnity · February 23, 2024, 5:21pm

Hi @kiranmgcv. One suggestion is that the Whisper model won’t work if there is silence at the beginning of the audioClip. Could this be an issue? Also the audio needs to be at 16000 Hz when recording from the Mic in Mono. Perhaps are you able to provide an audio clip we can test? Or is it the microphone implementation you are having trouble with?

nafis_fuad1234 · March 7, 2024, 9:56pm

Noticed how the model is in sentis format, is there a tutorial or script where the model is in onnx format so that I can try other ASR models too

PaulBUnity · March 8, 2024, 12:10am

You can convert any ONNX to Sentis format which is advisable for large files.

Alternatively just use:

public ModelAsset yourONNXmodel;

var model = ModelLoader.Load(yourONNXmodel);

and drop the ONNX onto the field.

Topic		Replies	Views
Real Time Voice Transcription on Mobile Questions & Answers Audio , Voice-and-Text-Chat , AI-Generators , Question	3	1057	October 20, 2024
Lightning fast Voice commands for Android and iOS (on-device) Unity Engine Audio , Audio-Video , Performance	14	5967	October 18, 2021
How to use a TTS model with Unity Sentis? Unity Engine Intermediate , Inference-Engine , 6-0-Preview , Question	5	1042	May 1, 2025
Speech recognitation. Any ideas? Unity Engine Audio , Audio-Video	2	1014	July 23, 2018
Voice / Speech recognition solution Unity Engine Audio , Audio-Video	5	2950	May 27, 2022

Automatic Speech recognition using Unity Sentis

Related topics