I’m creating an app whereby a user chats to an NPC, the NPC is powered by Open AI. I want my app to automatically detect when the user speaks, and do stuff with the microphone input (send the audio to openai for speech to text transcription etc) and detect when the speech has stopped.
Meta’s Wit AI can capture mic audio and transcribe it but it offers no automatic voice detection feature, you have to press a key/button first to let it know you’re speaking, I don’t want that. Can anyone point me in the direction of what I want i.e. an existing software solution etc?
thx for the suggestion, I tried that, it worked sometimes but not reliably. I found a python library that does pretty good voice detection, i ran it’s code in a websocket server connected to my unity app. bit messy but it works.
@Tyke18@FarmerInATechStack I have the same question with FarmerInATechStack
I try to use Azure stream speech to text to do the transcription.
And I also need the automatic voice detection so that it can stop crorrectly?
I also wanna ask
Can you please point me to which Python library helped?
(Same question) Does Wit AI offer speech to text transcription too, or did you find that Open AI was the only solution?
@carton22liu_unity I’ve gotten text-to-speech working using the OpenAI options and some scripts for microphone recording. However, I’m not doing automatic speech detection. I press a button to start recording from the mic.
If interested, I’m also on Discord at farmerinatechstack
Nice, you can probably skip that if you’re up for just using the APIs directly but it can also be really nice to have a solution that “just works” and someone else maintains.