We’re trying to incorporate Bing Speech to Text API in our Unity application, but the client library didn’t work out because Unity doesn’t support .NET 4.5. So we went the route of making HTTP requests using the sample code provided here: https://oxfordportal.blob.core.windows.net/speech/doc/recognition/Program.cs . We were able to use the Unity microphone (Unity - Scripting API: Microphone) to save data to a .WAV file, then submit a POST request to /recognize to get the audio back, but this resulted in high latency.
In the Bing Speech API sample code, it mentioned “Input your own audio file or use read from a microphone stream directly.”, but it didn’t provide us any idea on how to read from a microphone stream directly. The Unity Microphone class doesn’t seem to offer a lot of properties, events, or methods we can access. Any ideas? Thanks!