[Official] Audio Improvements

Hello!

As you have probably seen in other threads, we like to check in on the direction we are taking with our technology to make sure we are best serving you. In this thread we would like to discuss the direction of Audio with you. Instead of restating the long term roadmap and short term projects I’d rather just step back and listen to what you all have experienced and what you all are needing. Knowing a bit about you also helps us better understand your feedback so please consider the following questions to get this discussion going.

Who are you?

What kind of game are you trying to build or would like to build?

How does Audio fit into that? What use-cases do you have?

What are the GOOD things about the Audio system that you like?

What are the BAD things about the Audio system that you dislike?

How can we make it BETTER?

Thanks!

Erik

Hi,
We are Audioplum studio ( http://audioplumstudio.com/) and we have worked on the realisation of audio and its integration in Unity on several projects, the latest one being A.B.C. Murders. After a year of getting professional with the available tools, we decided to create our own ones, which we published on the asset store (Unity Asset Store - The Best Assets for Game Making) using our publisher’s name, Baranger-Holsnyder ; we tried to fulfill all audio integration needs by “filling in the blanks” in Unity’s audio, so to have a versatile and complete tool, made by sound designers for sound designers. It is all Unity is missing, and if Unity would be to make this part of itself, our plugin wouldn’t exist anymore, but you would have all that an audio professional can ask of a game engine.

Best regards,

Alexandre Baranger ( Director/Audio Producer ) and Sacha Holsnyder ( Sound Designer/Developer )

Where was this stated in the first place? The main roadmap doesn’t have anything about Audio on it, and hasn’t for a long time, even though this was mentioned at Unite Boston last fall.

That out of the way, on to a response:

Who are you?
I’m the lead developer of Koreographer.

What kind of game are you trying to build or would like to build?
As a side project I’m working on a musical spaceship shooter where all interactions are based on features of the music (the game uses Koreographer).

How does Audio fit into that? What use-cases do you have?
We currently use a multi-layer music system that we built on top of Audio Sources. Koreographer watches the state of Audio playback and sends signals to the game about what’s happening in them so the game can trigger events. The multiple layers allow the player to interact with the mix and overall experience. We also support pitch adjustments as speeding up/slowing down audio playback causes the game itself to speed up/slow down.

What are the GOOD things about the Audio system that you like?
This question is very difficult to answer because we tend to take for granted what we already have that works. As such this is a short-list of things I am thankful for:

  • File type support (thanks, FMOD Low-Level!).
  • File loading support seems to work flawlessly.
  • Basic AudioSource API is simple and intuitive.

What are the BAD things about the Audio system that you dislike?

  • Recommendations about audio preparation are non-existent and lead to confusion for teams. In conversations with folks on the Audio Team, I’ve heard stuff like all audio imported into Unity should be uncompressed (wav/pcm). The documentation explains the reason behind this but does not recommend authoring in wav and allowing the system to deal with compression internally. This means that a lot of teams deliver audio to Unity as compressed (and, importantly, lossy) file types.

  • Documentation for AudioSource.time and AudioSource.timeSamples does not mention that the values returned are not the “actual time of the audio precisely” but, rather, "the position of the audio buffer “read” head and therefore can be identical in situations with high frame rates.

  • This note in AudioSource.time is only somewhat helpful: “Be aware that: On a compressed audio track position does not necessary reflect the actual time in the track. Compressed audio is represented as a set of so-called packets. The length of a packet depends on the compression settings and can quite often be 2-3 seconds per packet.” Here are a few problems:

  • No mention of whether or not this affects the timeSamples version as well (a similar note does not appear in timeSamples).

  • No description of what the returned time actually represents. Does it equate to the point of the last sample that was buffered? The last sample that was decompressed?

  • Is there a way to mitigate this? Any recommended strategy for working around such a big gap?

  • As “Compressed audio” is cryptic shorthand for “MP3/Vorbis”, does this affect both of these? Only some? Is the problem platform-specific?

  • This quote is followed up with a “See Also: timeSamples variableon the same line as the warning, which makes them feel connected. I’m willing to bet that this is simply a related field and should exist in its own paragraph (as is done with the related message on the AudioSource.time doc).

  • Mod file issues that have existed with no word from the audio team for quite a while.

  • Very little metadata about what’s going on with audio:

  • Did the audio loop?

  • Did the audio jump/seek?

  • Sample-specific timing is fairly impossible, it seems. You can schedule audio but you cannot pin a sample to a location along the absolute timeline (this would require filling a buffer with zeros up until the location specified by the “pin” at which point actual audio samples would begin filling).

  • When scheduling one AudioClip playback to match another, you have to time them both against the AudioDSP timeline. If you attempt to change the playback speed (“pitch”) of the already-playing audio, you need to reschedule the one that is attempting to match the other. This is particularly annoying if you’re doing a pitch-ramp.

  • Dealing with synchronization of multiple audio layers is susceptible to threading issues (unless you essentially rebuild using PCM callbacks what’s happening at a lower level already) as adjustments to different AudioSources must happen serially on the Main thread.

  • No good control for AudioClip playback in the Editor. Basic playback is indeed possible but you do not get the same control over pitch, effects, etc. that you do on the game side. You can get this stuff at edit time but it requires adding a “no save” object to the scene and dealing with that complication.

How can we make it BETTER?

  • When retrieving time (whether samples or seconds), add an option for this to be estimated. Currently it simply tells you “the audio buffer has read this far”. The estimation would calculate how much of the buffered samples have likely made it to the speaker.

  • Improve documentation of what’s going on with the AudioSource.time and AudioSource.timeSamples APIs.

  • Improve Audio Synchronization capabilities.

  • Allow one AudioSource to synchronize/schedule against another AudioSource.

  • Allow scheduled playback to be pinned to a specific location along the timeline. This should properly handle pins that occur in the past (to handle situations where someone attempts to pin something to now-time rather than slightly in the future). By this I mean that if I say “pin sample 50 to now” and the buffer read is already 100 samples ahead of what the DSP time was reported as “now” then the buffer will begin filling at sample 150, as though it had already handled buffering the previous 100 samples.

  • Provide an immutable struct with property updates that can be passed to the Audio thread, ensuring that changes to a multiple audio sources happen at the same time on the Audio thread. Something like “AdjustParametersSynced”…

  • Expand the Editor playback controls of AudioClips (getting AudioSource equivalence would be amazing). This would allow more interesting/useful edit-time tools to be developed.

I hope this is helpful!

1 Like

SonicBloomEric, I agree with everything you said. However, I believe rescheduling is worse than you think. Referring to this:

  • When scheduling one AudioClip playback to match another, you have to time them both against the AudioDSP timeline. If you attempt to change the playback speed (“pitch”) of the already-playing audio, you need toreschedule the one that is attempting to match the other. This is particularly annoying if you’re doing a pitch-ramp.

I have a thread stating that it is impossible to guarantee accuracy (for gapless) of the rescheduled clip based on the current position of the audio, due to the fact that audiosource.time is a float and the DSP one used to calculate schedule time is a double. Math precision gets lost and often tells you 3 slightly different times to schedule when asking 3 frames in a row! Here’s my thread.

http://forum.unity3d.com/threads/why-is-the-calculation-of-end-time-for-an-audio-clip-always-slightly-different-every-frame.354579/

Basically, if you change the pitch or position of a currently playing clip, and want to reschedule the next one to be gapless, it’s a crapshoot. Sometimes there will be a slight gap. Sometimes there will be a slight overlap. Sometimes it will be gapless (or close enough to not be recognizable as non-gapless).

Would also like to see:

  1. Ability to loop a clip X times instead of just no loop and infinite loop.
  2. Built in events on the Audio Source we can subscribe to telling us: did the audio stop / loop back to beginning.
  3. Ability to loop just part of the clip. Making intro/looped section possible in a single clip.
  4. To fix the math error rescheduling problem I stated above, would like a property in AudioSource to give the always correct AudioDSP end time of the current clip if it is not looped, based on its current position and pitch.

These are all issues that many of the 6,000+ users of our Master Audio plugin encounter in some cases or are asking for.

1 Like

This would be great but we’d still have potential issues with scheduling reverb tails to apply to the sub-section loops. This would be handled if we could schedule an AudioSource to play along with the timeline of another AudioSource, and not just the Audio DSP timeline.

One more for “How can we make it BETTER?”:

  • AudioSource.time and all related time-in-seconds methods should have a version that uses double as the type for all the precision issues mentioned by @jerotas above [and more!].

Hope this helps.

What I like:

  1. API is easy.
  2. Audio Mixer in Unity 5 is really cool.

Dislike:
well the sheer number of dislikes (mostly things missing) is why we built Master Audio. The number of features not in “core Unity audio” that are commonly used and needed are staggering. I won’t enumerate them here because that’s a source of income for me.

A notable one that our plugin does not cover and probably never will:

  1. Ability to do split-screen games with multiple Audio Listeners.

Also, audio occlusion would be a great thing to add (we plan on adding it to our plugin regardless)

Who are you?

Developer of the Fabric audio toolset

What are the GOOD things about the Audio system that you like?

  • Simple straight forward audio API
  • Good cross platform compression support
  • Audio Mixer is implemented well
  • Native Audio Plugin SDK
  • Audio clip preload/load in background options are very useful

What are the BAD things about the Audio system that you dislike?

  • Threading issues, we have games that can sometime experience stalls when Fabric calls certain AudioSource functions. I believe this is probably due to FMOD’s somewhat “messy” multi-threaded support.

  • As mentioned already scheduling of audio sources with variable pitch changes can be very tricky. In Fabric this is handled by using the Audio DSP timeline entirely (can’t use audiosource.time due to the math errors jerotas mentioned) and re-calculating the next transition internally but it requires a small latency on first play… its not ideal but it works unless the audio thread is starved then things get out of sync since there is no mechanism to detect that.

  • No easy way to load audio clips that are not referenced by a script and are located outside a resources folder. Yes this can be done using the WWW class but its audio support is not very good.

How can we make it BETTER?

The suggestions by SonicBloomEric are spot on, on top of them I will add…

  • Ability to set channel gains on multi-channel audio clips.
  • Ability to query audio source’s final gain (after attenuation is applied)
  • Allow to pass different types of parameters to the native audio plugins not just floats
  • Allow Native Audio Plugins for audio sources as well
  • Multi listener support, this is supported already by FMOD internally so in theory!! it should be easy to expose.
1 Like

I’m the author of RhythmTool.

For the stuff I’m working on I don’t need a whole bunch of features. Just something that lets me play and analyze samples. In that regard, the current system is easy and straightforward to use. It’s easy to analyze and manipulate AudioClips.

The main problem I have come across is the limited support for loading audio files at runtime. Loading MP3 files is only supported on mobile platforms, for example. That’s probably due to licensing etc., but it should be possible to use the OS’s decoder, like Bass.dll does.

Other than that I would really like to see a GetSpectrum method that would return the spectrum of a specified part of an AudioClip, and not just the spectrum of what’s currently being played by an AudioSource. That would make it a lot easier to analyze spectrum data ahead of time. Right now you’d have to either play a muted version or roll your own FFT.

hi,

I’ve pretty much used only the basic unity audio features (as in, drag & drop wav/ogg in and it works fine…)
but if there is a wishlist, then playing c64 sid files would be nice!

One little thing I came across just yesterday was that the workflow for the AudioMixer is pretty poor. Unless I’ve missed something extremely basic, it appears that you cannot audition your Mixer setup unless you’re actively playing your game? If triggering/controlling a mix with the editor in Edit Mode isn’t something that’s already supported then it really should be.

I imagine, for instance, that I should be able to play a sound effect through a bus in a looping manner and then play with Effect settings in real time to get a sense for what the changes are actually doing. Currently it seems that the best way to simulate this would be to build a test scene for this very purpose and turn on “Edit in Play Mode”… which seems like way too much work for something that should simply be built in.

Based on this tutorial video it appears that the normal workflow is to create a dummy scene, add an AudioSource with Play On Awake and Looping enabled and then put the Audio Mixer panel in “Edit in Play Mode”. This feels counterintuitive and actually goes against everything we’re taught throughout the rest of Unity (don’t make edits you care about in play mode). We don’t have to be in Play Mode to create and edit animations, right? Why is audio any different here?

Who are you?
Developer of the ELIAS Adaptive Music Engine + A Unity Plugin for it.

What kind of game are you trying to build or would like to build?
A Music engine that allows for much better handling of music in games.

What are the GOOD things about the Audio system that you like?
It’s easy to do stuff such as output a custom stream.
Unity handles resampling between both sample rates and channels.
OnAudioFilterRead saves the day for the most part! But removes the other benefits mentioned.

What are the BAD things about the Audio system that you dislike?
Missing a few parameters in C# for streamed sounds, because the default in FMOD seems to be 400ms buffer size, and I have been unable to find a way to change that for streamed audio, even if I set the length of the audio clip to be smaller then that. (In what case it will simply read smaller amounts at a time, until it has gotten the 400ms it wants.

How can we make it BETTER?
For streamed sounds, don’t ask for 400ms of sound at a time. Either expose it as a parameter or simply put it to max the size in samples parameter passed in.
It’s only possible to get decoded data from an audio clip if it’s decompressed on load. I would either like to be able to get the raw (ogg) data “streamed”, and/or get raw data from streamed samples. (I can understand that this fix would require a bit of work).
Allow OnAudioFilterRead to be run before channel and sample rate conversions are run. Currently the “filters” need to be able to handle a lot of different combinations.
As mentioned by Tazman, allow audio sources to be created in the Native audio plugin.
Allow this without a crazy hack: native audio plugins in unity 5.0
Allow the Native audio plugins to use more then simply floats. (Perhaps the same list as FMOD supports, as the Data parameter type would allow more custom behavior).

I think you may be looking for DSP Buffer Size stuff. Check out the Audio Manager for configuration settings and access to adjusting it.

Oooh, seconded!

Oh no, that won’t make any difference! If you look at FMOD’s code, you’ll see that when creating a streamed sound you choose how large the stream buffer is supposed to be, and that is by default 400ms.
Your suggestion matters for OnAudioFilterRead, but not for OnAudioRead… (Unless they have fixed this with a somewhat recent version of Unity of course).

Ahh. This I could not speak to as I have not tried to use the OnAudioRead. I guess I’m used to using the PCMReaderCallback. And it is (was??) my understanding that the size of requested audio is dependent upon the DSP Buffer Size. I actually didn’t even know that MonoBehaviour.OnAudioFilterRead was a thing…!

What is the “OnAudioRead” method/function to which you are referring?

Who are you?
Developer of “unofficial” Unity plugin of Superpowered audio sdk and Alive music engine(no reference it’s an internal project)

What kind of game are you trying to build or would like to build?
It’s a secret right now, but it will incorporate rhythm mini games

How does Audio fit into that? What use-cases do you have?
The alive engine play tracker modules with automation, note events, instruments and sample swap at runtime and more. It is mostly used to have a better feedback and immersion in the game.
I am relying mostly on onAudioFilterRead and Resource async loading at runtime.
I am also using the above plugin, but that’s more of a pet project and out of necessity right now(see bug below)

What are the GOOD things about the Audio system that you like?
I love that it’s simple and just works. Everything is easy to setup in seconds.

What are the BAD things about the Audio system that you dislike?
Too little access to low level things.
Some on the top of my head:
1-Cannot access audio files binary if the audio it’s recognized by unity.
Please allow reading recognized files too! Right now the only way it’s to rename them to .bytes.
So you either read them by yourself or you read them with Unity. Cannot have both.
Also allow reading the audio data as it is as bytes or words, so we avoid doing back and forth between float and bytes.

2-onAudioFilterRead has lots of bugs.
The last I have reported is that it allocate memory for the buffer at every call.
Another one is that if you create an audioclip (didn’t report this one yet, sorry) and attach it to the audiosource where the onAudioFilterRead is, the mixer will be totally ignored. No mute, no bypass.
Another I can’t remember, probably was only with Unity 4.

3-Please allow PCMReaderCallback(is that the onAudioRead you mention? ) to act exactly like onAudioFilterRead. Right now it prebuffer something at initialization, and it stop being called after the length you specified. It would be nice to have an infinite call so that you can have “virtual” audio clip that are actually generated by code elsewhere. With onAudioFilterRead you lose this abstraction.

4-The native audio plugin is a nightmare. There is very little info on it, that I gave up.
Plus the workflow is too lengthy, allow the code to be integrated and compiled in Unity where possible, just like iOS plugin that you can include m,mm,cpp,h, files in the project.

With onAudioFilterRead you don’t have to resample samplerates and channels. Maybe you did something you shouldn’t? onAudioFilterRead it’s quite picky and buggy.
For example you must not create and attach an audioclip or it will bypass the mixer(while in Unity 4 you are required to create it or it will have issues).
Anyway there is a bug in that method now that allocate memory for the buffer at every call, causing the GC to fire continuosly, so beware.

I get around this by creating a “buffer” audio clip and set the AudioSource to looping. If I want no audio played back for a time I can either stop the AudioSource or simply pass zeros to the buffer. Have you tried something like this?

But does it get called at constant time like onAudioFilterRead?
Somebody told me it doesn’t. I need to play audio in real time.

I have no idea how OnAudioFilterRead works just yet - never looked into it. The PCMReaderCallback is called from the Audio thread with the exception of the initial “Play” call - in that case it seems to fill up the initial playback buffers from the main thread before handing things to the audio thread. It doesn’t get things “in real time” as it uses the buffers, of which the number and size can be configured in the Audio Manager. The lower the number & smaller the size of these buffers, the lower the latency. That said, you also increase the likelihood that you’ll end up with voice starvation.

I would have completely guessed that OnAudioFilterRead works the same way (or similar enough as makes little difference in the end - especially if it’s called on the Main thread, rather than the Audio thread).

Allow us to actually inspect the state of the AudioSource. It’s really hard to understand why there isn’t an “isPaused” flag (or, better yet, “AudioSource.playbackState”, which could be “Playing | Paused | Stopped”). We have the ability to call Play, Pause, UnPause, and Stop. There is a difference between Play and UnPause and the one “isPlaying” flag simply isn’t enough.

I investigated further and like I thought, PCMReadercallback is not tied to sample rate but it’s file buffer based.
If I input 2 sample as length, it ask me 4 sample (probably because it’s the minimum for stereo), if I input 4096 samples it ask me 99% of the time 4096 and sometime a little less, can’t understand the rationale but it’s not tied to sample rate.

onAudioFilterRead is the way to go for outputting a wave stream to the soundcard.

So, I stand on my request. Could you guys implement a PCMReadercallback variant that act exactly like onAudioFilterRead, using samplerate and dsp buffer size? So we can have a virtual audioclip to pass around in the project?