I’d like to mix (not concatenate) two different AudioClips together to create a new AudioClip. I realize I can play two different AudioClips simultaneously, but that’s not what I’m trying to accomplish. The result I want is to output a file which is the conbination of two different AudioClips which are the same lenth/number of channels/frequency into a new clip, which when played back is the same as listening to both clips at the same time. First of all, is there an easy way to do this within Unity that I might have mixed? Maybe some part of the API that can accomplish this without having to mangle with byte arrays?
If not, so far, I’ve been trying to do this programmatically in code. I create two byte arrays, one from each clip, and combine the bytes together, then convert back to a float array and create a new clip. I’m assuming that unity’s audioClips are 32-bit samples, since the API returns a floatArray. Perhaps that’s an incorrect assumption (Unity converts to vorbis format by default). For arguments sake, lets assume both my clips are the exact same length, and the sample rate is the same 44100hz. To reduce the possibility for error, let’s also assume both samples are a single channel(ie. mono).
Here’s my code to get the audio data from two clips
float[] floatSamplesA = new float[clipA.samples*clipA.channels];
clipA.GetData(floatSamplesA, 0);
byte[] byteArrayA = floatToByte(floatSamplesA);
float[] floatSamplesB = new float[clipB.samples*clipB.channels];
clipB.GetData(floatSamplesB, 0);
byte[] byteArrayB = floatToByte(floatSamplesB);
byte[] mixedBuffers = MixBuffers(byteArrayA, byteArrayB);
float[] mixedFloatArray = byteToFloat(mixedBuffers);
AudioClip result = AudioClip.Create("Combine", mixedFloatArray.Length, clipA.channels, clipA.frequency,
false);
result.SetData(mixedFloatArray, 0);
My code to convert a float array into a byte array is straightforward:
private byte[] floatToByte(float[] floatArray)
{
byte[] byteArray = new byte[floatArray.Length*4];
for (int i = 0; i < floatArray.Length; i++)
{
float currentFloat = floatArray[i];
byte[] float2byte = BitConverter.GetBytes(currentFloat);
Assert.IsTrue(float2byte.Length == 4);
int offset = 4*i;
byteArray[0 + offset] = float2byte[0];
byteArray[1 + offset] = float2byte[1];
byteArray[2 + offset] = float2byte[2];
byteArray[3 + offset] = float2byte[3];
}
return byteArray;
}
After I’ve got my two byte arrays, I thought I might be able to mix them together doing something like this:
private byte[] MixBuffers(byte[] bufferA, byte[] bufferB)
{
byte[] array = new byte[bufferA.Length];
for (int i = 0; i < bufferA.Length; i++)
{
byte byteA = bufferA[i];
byte byteB = bufferB[i];
byte byteC = (byte)(((int)byteA + (int)byteB >> 1));
array[i] = byteC;
}
return array;
}
To code to convert the byte array back into a float array (to feed into an audio clip) is essentially the reverse of the floatToByte function:
private float[] byteToFloat(byte[] byteArray)
{
Assert.IsTrue(byteArray.Length % 4 == 0);
float[] floatArray = new float[byteArray.Length/4];
for (int i = 0; i < floatArray.Length; i++)
{
int offset = 4*i;
byte[] byteArrayChunk = new byte[]
{byteArray[0 + offset], byteArray[1 + offset], byteArray[2 + offset], byteArray[3 + offset]};
floatArray[i] = BitConverter.ToSingle(byteArrayChunk,0);
}
return floatArray;
}
The result is definitely a mix of the two clips, but there is a ton of static. Perhaps precision loss when I’m going from 32-bit floats to byte arrays, and converting back? If what I’m trying to do (add the bytes) is naturally going to introduce a lot of noise/static, how would I go about creating two channels and adding each clip to it’s own channel in the new clip?