Dear all,
Recently I am fascinated by voice technology. My kids (4 and 7) are fans of Google Voice search. It’s so fun to ask Google about all kinds of things they are curious about
So I thought it would be cool to use voices to control games! I mean using only short commands like jump, gun, fire, heal, drop, left, right, stop, etc. The commands will need to be recognized extremely fast.
I did a bit of search on the asset store to find if there is any asset that does this. But I couldn’t find one that fits my needs. There are assets that wrap around cloud-based or OS-based speech recognition services. They are great for their use-cases but they could be slow, bandwidth consuming, battery consuming, do not provide a smooth user-experience, could be expensive to scale if cloud-based speech recognition services are required [1].
So I thought of building a lightning fast on-device voice commands embedded directly inside games. It works everywhere (Android, iOS ) with zero dependency on OSes speech recognition, internet connections. Scaling as big and as fast as you want when your games attract hundreds of thousands of users. Everything is done on-device, fast and efficient.
Would be great to hear your thoughts. How should the library be designed to be easy to use and to fit most games. What voice commands will you want first for your games?
Hi, I am not entirely sure what you are looking for, but I am guessing you what this done on the phone.
I have Zero coding skills so I can not help ( sorry ) … But I think it is a cool idea, after a couple of days thinking on
it in the back of my mind, I did a search on the store for speech to text. So I am guessing what you could try is
Use the speech to text … then the code would see the text " jump " and then it would do the action like
Speech to text jump … then jump means move the object in the Y axis … If i have the thought right.
Hi Mark_01,
I am planing to implement a voice command recognition for Unity. The system should be tiny, fast and accurate. It should be cross-platforms (Android and iOS for now). It should not depend on OSes (like Android, iOS) to do speech to text recognition. It can work offline on devices, no internet connection required.
Speech to text (STT) can be a good start. But STT systems might not be the best fit for the use-case I mentioned. If STT systems use deep neural networks, they tend to be quite big (1GB memory required) [1]. So accurate STTs tend to be hosted on clouds. To make STTs smaller, accuracy is often sacrificed.
Interestingly, for games, we might need just a small set of predefined commands (20-30 commands for example). We could build a tiny (less than 1MB) voice command recognition system that has characteristics/features I mentioned above.
I would like to hear thoughts about the idea, different use-cases (for different types of games) and any set of commands that can be practically used. If the idea makes sense and can be useful, I will start implementing it.
Hi trungnq97,
Did you manage to implement DeepSpeech in unity to run for Android? I am looking to do something similar and wanted to know if this option is viable, more specifically I’m looking to implement speech recognition on an Oculus. Any comments orsuggestions?
I am trying to use deepspeech, imported the .dll and .so file, I can use their namespace but was unable to load a model cause it says libdeepspeech.so not found. (I have already place it under Plugins/x86_x64 folder…) I managed to run it on normal .Net Core project in visual studio, just no luck in Unity…
Is it possible to somehow combine the .dll with the .so file? (This might be a stupid question but I am quite new here to libraries and binary stuff) If you managed to implement deep speech let me know!
Have you managed to figure this out? I am also trying to implement STT with Deepspeech in Unity and I’m getting the same error you were getting where it says libdeepspeech.so is not found.
Hey yo! glad to know that there are ppl interested in it! Yes I did made a working version!
link here: https://github.com/voxell-tech/UnityASR
Let me know if you got any error from using it. I will make a video on my Youtube channel on how to set things up in the near future (https://youtube.com/voxelltech) and also improve the readme XD. for now, enjoy deep speech!
Hey @carlordvr , I had speech to text working already. It is using deep speech currently. I haven’t put up any demo scene yet as of how to use it, I will make one ASAP and post it here to keep you guys updated! (Will update the readme also ofc hahaha)
Hi guys, I had made a mini demo on realtime deep speech stt but it doesn’t recognize words as intended and also crashes unity, I am still figuring it out, if you guys had any idea please do let me know!
btw this is all in a separate branch called deepspeech: https://github.com/voxell-tech/UnityASR/tree/deepspeech
Also, here’s a mini snippet of code to test if your deepspeech works! Read the README first and follow the installation step!
using UnityEngine;
using System;
using DeepSpeechClient;
using Voxell.Inspector;
public class DeepSpeechTest : MonoBehaviour
{
public string modelPath;
public AudioClip clip;
[Button]
void Test()
{
DeepSpeech sttClient = new DeepSpeech(modelPath);
float[] floatData = new float[clip.samples];
clip.GetData(floatData, 0);
short[] shortData = AudioFloatToInt16(floatData);
string speechResult = sttClient.SpeechToText(shortData, (uint)clip.samples);
Debug.Log(speechResult);
sttClient.Dispose();
}
private static short[] AudioFloatToInt16(float[] data)
{
Int16 maxValue = Int16.MaxValue;
short[] shorts = new short[data.Length];
for (int i=0; i < data.Length; i++)
{
shorts[i] = Convert.ToInt16 (data [i] * maxValue);
}
return shorts;
}
void Update()
{
}
}
Edit: the new script is called AutomaticSpeechRecognition.cs, it uses default microphone to take in speech and decode them in a separate thread. not sure what I did wrong yet.
Warning: Once you exit play mode, Unity will crash, at least for my case.
Also, I think that the title of this thread is not describing correctly what we are dealing with here, so I created a new one: Unity Automatic Speech Recognition