Hi guys, recently, I created a simple implementation of Deep Speech in Unity. But I somehow is unable to successfully use the deep speech stream module which enables realtime speech to text transcription.
You can find my demo here: GitHub - nixon-voxell/UnityASRExamples: Examples of Automatic Speech Recognition in Unity
And the core package here: GitHub - nixon-voxell/UnityASR: Automatic Speech Recognition in Unity.
Thank you! I got the AutomaticSpeechRecognition to work and will be pushing up the fixed example later today !
oh wow thank you so much! On the side note, for the previous version, did it crash Unity on your side? Cuz for me, everytime I go out from play mode, unity crashes for some mysterious reasons… XD
I decided to start from scratch and created a lightweight implementation. It processes the audio on a separate thread. Here is the link if you want to check it out - GitHub - Babilinski/deep-speech-unity: A Unity implementation of DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices.
The reason why Unity crashes is because the thread is still writing to deepspeech when you exit Unity. Make sure that you check if you are writing to Deepspeech before trying to dispose it.
This is absolutely gold! Thank you so much!
Very interesting!
It would be even better if you can make it work wih Coqui STT (a fork of DeepSpeech) because DeepSpeech development has been slowed down. The core developers of DeepSpeech moved over to Coqui STT. More information here:
https://discourse.mozilla.org/t/a-new-in-speech-tech-town/77026
https://discourse.mozilla.org/t/future-of-deepspeech-stt-after-recent-changes-at-mozilla/66191
Edit:
I managed to port Coqui STT to Unity using the DeepSpeech project from @kbabilinski as a reference. I made A tutorial with source files. It is available here:
https://bitbarrelmedia.wordpress.com/2021/09/19/coqui-stt-in-unity/
I slightly modified the ContinuousVoiceRecorder and SpeechTextToText scripts to make it work with CoquiSTT along with a few other minor imporvements like not requiring to enter a hard coded path to the model files.
This is really awesome! Will check it out! Thanks!
good but
why This error appears to me :
FileNotFoundException: Cannot find the model file: C:\Users\kbabi\Documents\GitHub\Deep-Speech\Assets\StreamingAssets\deepspeech-0.9.3-models.pbmm
DeepSpeechClient.DeepSpeech.CreateModel (System.String aModelPath)
You need to download the deepspeech model (if you are using my repo, here is the model: https://drive.google.com/file/d/1RA9MDwconsoPexjngivo2jthiAG6ZQNa/view?usp=sharing)
I have not updated my repo though, will try to implement kbabilinski’s solution ASAP.
Edit: the model should work with kbabilinski’s repo too.
Hi, I have your implementation working on my Windows machine but would very much like to get this working on an Android device. Do you have any pointers for me? I followed your tutorial and put the libstt.so in the android folder but when running on device it says it can’t find the lib. I have tried moving it to 'Assets/Plugins/Android too but its not finding it. Do I need to explicitly build a binary for Android? Thanks!
No idea. I moved onto another project.