Compare two Audio Clips

The title explains its self, but here is the context; I am creating a voice recognition program. One key element is to compare two audio clips together. How would I go about doing this?

Thanks before hand!

If we’re talking about the waveform data there are different metrics to consider:


  • The length of the audio
  • The number of channels
  • The sample width
  • The framerate


  • The amplitudes


  • The tempo
  • The frequencies

By the looks of it, Unity actually does a lot of the work for you here.

AudioClip seems to cover the trivial items, while AudioSource provides FFT functionality (for frequency analysis).

For amplitudes you can simply sum the deltas between 2 curves at each sample, though you might want to do some volume normalization first if you only care about the shape.

Tempo is odd, you can probably ignore it, but just in case: Beat Detection

The more of these metrics you can calculate, the more accurate your “similarity” metric is going to be.

Dear @VesuvianPrime

Your answer is convincing but looked complicated to me to achieve. But in my case could be simpler to solve. I am trying to recognize pause in the speech. and filler words such as “urm” “uh” “err”

Does using fft comparison will be good enough to do the trick?

So sorry for being late in the discussion. Just recently embark on this project.