PCMReaderCallback, Native Audio Plugins & Spacialization

I’m working on a plugin for Unity that (amongst other things) streams audio from an external source. This is a slightly unusual use-case but there seem to be a couple of ways to go about it, both with drawbacks.

CreateAudioClip with PCMReaderCallback
This seems to work well enough and interacts the way you’d expect with 3D sound and any Spacialization plugin you have active. You can place an AudioSource anywhere and the streamed data will naturally just play from it. This is the most “natural” way of doing it.

Unfortunately there’s quite high latency between callbacks and–as far as I can tell–the timing information in AudioSource.timeSamples is not at all accurate. I haven’t evaluated performance but I can’t imagine the native-to-managed-to-native call overhead is great.

Native Audio Plugin
This seems to be the more modern and performant way to stream data into the Mixer system. Its native C++, which is perfect in my case.

The drawback is that it seems to be completely separate from the AudioSource system. Since the mixer happens after spacialization I’d have to write my own solution for panning or 3D sound.

DIY
(I suppose I could use some completely other native audio plugin, which I’ve noticed several plugins like video players will use. This seems like the worst of both worlds, although I suppose it’s the most flexible.)

Is there any other way to stream or otherwise generate audio so that it participates correctly in 3D sound? What’s the best practice for generated or streamed audio?

I’m using onAudioFilterRead to send audio generated procedurally(sort of a sequencer). It works well but in Unity 5.3 there is a bug that allocate memory at runtime continuosly for the temp buffer it pass you.

How where you able to use PCMReaderCallback to stream audio? I tried but did encounter too many issues.
First it was pre buffering without me telling it so.
Second it was only called for the length of the created audioclip, but I need to generate an infinite stream.

Yeah, it pre-buffers in a way that I can’t control, but it will keep reading data if the AudioSource it’s playing on is set as “looping”. You can get an infinite stream that way–it will keep reading data forever.

That sucks about the memory leak. This stuff all seems pretty new and not well-tested.

Actually except native plugin they are quite old except native plugins.
I’ll have a look at the looping way. but if it’s not called constantly it’s no good for me. I need to play audio in real time, onAudioFilterRead is called at regular intervals.
If you want to vote for a fix (maybe it will speed up the resolution), my bug report is here:
https://issuetracker.unity3d.com/issues/onaudiofilterread-allocates-memory-every-frame-instead-of-reusing-the-same-buffer

Don’t use AudioSource.timeSamples. It’s not plugged into the actual audio back-end, and will basically return the approximate playback position calculated by multiplying AudioSource.time by AudioClip.samples.

Calculate your sample position manually in your PCMReaderCallback. Store a position field somewhere, and increment it by the length of the sample buffer each time your callback is called. Additionally, implement the PCMSetPositionCallback, which is called every time the playback position is changed. This will allow you to keep track of the true sample position, and eliminate the delay on your custom audio stream.

private void PCMReaderCallback (float[] data) {
    // ...    
    // Do some stuff
    // ...

    position += data.Length;
}

private void PCMSetPositionCallback(int position) {
    this.position = position;
}
1 Like

I’m having some trouble with using PCMReaderCallback. The callback is never called when the audioBuffer is filled. It should be called automatically by Unity. Here is a sample of my code:

private MemoryStream audioBuffer;
async void Start()
        {
            Debug.Log("Start");
            audioBuffer = new MemoryStream();
            // Initialize the audio clip and buffer
            audioClip = AudioClip.Create("ElevenLabsTTS", 44100, 1, 44100, true, PcmReader);

            await ConnectToWebSocket();  //Some logic is implemented here to send the message to web socket and receive a byte[] from it
            audioSource.clip = audioClip;
            audioSource.Play();
            Debug.Log("Playing audio clip");
        }

private void PcmReader(float[] data)
        {
            if (null == audioBuffer || audioBuffer.Length == 0) return;

            // Create a binary reader for the memory buffer
            using (BinaryReader reader = new BinaryReader(audioBuffer))
            {
                Debug.Log("audioBuffer is read");
                for (int i = 0; i < data.Length; i++)
                {
                    if (audioBuffer.Position < audioBuffer.Length)
                    {
                        // Read a 16-bit sample from the memory buffer
                        short sample = reader.ReadInt16();
                        // Convert the sample to a float in the range -1 to 1 and store it in the data array
                        data[i] = sample / 32768f;
                    }
                    else
                    {
                        // If there is no more data in the memory buffer, fill the rest of the data array with zeros
                        data[i] = 0f;
                    }
                }
            }           

            if (audioBuffer.Position >= audioBuffer.Length) audioBuffer.SetLength(0);
        }

When the message is received from the socket, which is the audio of “Hello”, this audioBuffer is filled.

websocket.OnMessage += (bytes) =>
            {
                string message = Encoding.UTF8.GetString(bytes);
                // Debug.Log("OnMessage : "+message);
                var data = JsonUtility.FromJson<MessageData>(message);
                if (data.audio != null)
                {
                    byte[] audioData = System.Convert.FromBase64String(data.audio);
                    audioBuffer.Write(audioData);
                    Debug.Log("Added audio data");
                }
            };

At this point, PcmReader() should be called, but is not. What am I missing? Appreciate if anyone could help me here.

So just as a sanity check, I would validate that the created AudioClip properties match what you expect (frequency and channels).

Did you confirm that you function is indeed not called and not just outputting silence? You could try to feeding a sinusoidal tone in PcmReader.

Something that’s possible, although I’m not sure if it’s the case here, is that if there is no data to read, the AudioSource might stop, so when there is no data coming you could feed a little silence while data comes in.

A line of code was missing in my OnMessage callback, where I play the audio. I have mentioned it below

websocket.OnMessage += (bytes) =>
            {
                string message = Encoding.UTF8.GetString(bytes);
                // Debug.Log("OnMessage : "+message);
                var data = JsonUtility.FromJson<MessageData>(message);
                if (data.audio != null)
                {
                    byte[] audioData = System.Convert.FromBase64String(data.audio);
                    audioBuffer.Write(audioData);
                    Debug.Log("Added audio data");

                    audioSource.Play();
                }
            };

Now the PcmReader() callback is called and I hear some audio but it is all garbage. The audio stream data I receive is in compressed mp3 format. How to play this audio stream?

Most of the solutions I find online use NAudio, but it is for windows applications. I need it to play in Android, iOS and web platforms. What option do I have?

Ouch… if you’re stream shoots encoded data and you feed that directly in a PCM reader (expecting plain f32 interleaved samples) it’s normal you get garbage out of it…
Unity does not expose codecs directly, and even then I don’t think what we have would be what’s needed to stream and convert MP3 chunks. AVPro on the asset store, which is an incredible media package, but is a bit pricy, would probably be able to cut it…
Maybe other assets could stream radio, that would require some research.

Otherwise, if you’re able to download full mp3 files from your source, you could create and audio clip from that data and then the whole file would become accessible through Unity’s features.

Just landed on this page as I’m researching solutions to play mp3 sound provided as a stream from a webservice. There seems to be an asset that could be a good starting point as in includes full source code: uAudio. Haven’t had time to test it myself but I thought you might be interested by the information

1 Like