# How can I use spectrum data to create a frequency based beat detection?

Hello everyone.

I’ve been thinking about making a rhythm game for a long time and I want to take action. The problem is though, after all the research and reading through old posts in unity forum/reddit etc., I still don’t have a clue about how can I create what I aim for. At this point I’m not even sure if I’m approaching this the right way because I probably lack the technical knowledge.

So here’s the thing as far as I understood ;

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class something : MonoBehaviour {

public AudioSource audioSource;

public float[ ] freqData = new float[1024];

void Start ()
{
audioSource = GetComponent ();
}

void Update ()
{
audioSource.GetSpectrumData (freqData, 0, FFTWindow.BlackmanHarris);
}
}

Let’s say I have this code in place. My sampling rate is at 48000 Hz. The array size determines the frequency range of each element. So with a size of 1024, my frequency resolution is 23.4 Hz. So freqData[0] represents the freq between 0-23.4 Hz. freqData[1] represents 23.4-46.8 Hz. so on and so forth. And I want to detect, let’s say bass beats between 60Hz and 250 Hz.

How can I translate this data into “hey unity I want you to check those wiggling small numbers(relative amplitudes of frequency i think) from index 4 to 11, and if some of them is above a certain threshold(of decibel or whatever i don’t even know), I want to register them as beats.”

And everytime there is a SendMessage(“Beat Detected”) in place I can instantiate a cube to represent the beat for instance.

Even though it is an extremely complicated subject, I feel like what I want to achieve is relatively simple to accomplish yet I feel like an idiot not being able to do it.

Hmmm . . . . take this as commentary from a complete noob to Unity, I don’t know how Unity handles audio (yet), in fact I’m only at the 'make a cube move with a key press and celebrate with beer" stage . . . . but . . . here are a few thoughts.

And I want to detect, let’s say bass beats between 60Hz and 250 Hz.”

The majority of a kick drum’s body/weight usually lives around 50Hz (with the attack around the 5kHz area).

"How can I translate this data into "hey unity I want you to check those wiggling small numbers(relative amplitudes of frequency i think) from index 4 to 11 . . . "

This would be likely to produce unreliable results as there is all sorts of mayhem going on under 250Hz, and not just cleanly defined kicks, unless the track is just a kick track, for example the main body of most snare drums (especially modern snares) falls into this range (around 200Hz), as well as all the low end of most instruments, including the human voice, and of course the bass line.

One way around this would be to work with stems rather than a full mix. I would have thought an even better / simpler approach would be to use a (very steep / 48dBOct) low-pass filter on the whole track and then simply detect peaks above a certain threshold ?

An even simpler approach would be to reproduce the rhythm track - drum along to the track (after making sure you’ve locked down the tracks tempo) on your keyboard in GarageBand/Logic - using a simple (easily detectable) sound, for example a rim shot, this will give you a super clean signal, which you can then use to ‘detect’ the kick drum in this actual track (in reality Unity will be detecting the rim shot) . . . or maybe Unity has a midi reader ?

“The array size determines the frequency range of each element. So with a size of 1024, my frequency resolution is 23.4 Hz. So freqData[0] represents the freq between 0-23.4 Hz. freqData[1] represents 23.4-46.8 Hz. so on and so forth. And I want to detect, let’s say bass beats between 60Hz and 250 Hz.”

This would be pretty inefficient the higher up the frequency range you go, as audio is measured on a logarithmic scale . . . lets take a range of 50Hz, there is a pretty big difference between 50Hz and 100Hz - from 50Hz to 100Hz is 1 octave - you might be moving between two entirely different instruments with a 50Hz shift this low - listening to a sine wave no one will have trouble hearing the difference between 50Hz and 100Hz . . . . . . whereas a 50Hz shift at the other end - let’s say 10,000Hz to 10,050Hz is imperceptible, here rather than it being a range of 1 octave both 10,000Hz are 10,050Hz the same musical note (D#9) with the 10,050Hz frequency being a tiny (I mean imperceptibly) bit sharper than 10,000Hz, but to pretty much everyone who is not a bat, they are the same pitch, at 10,000Hz you’d need to jump to 20,000Hz to cover 1 octave, a change of 10,000Hz, rather than 50Hz. Hope that makes sense !

Also you would not really need to measure down to 0Hz ! 0Hz is basically a DC offset, a signal containing 0Hz is basically limiting your dynamic range - I think for most things you can safely start measuring at 50Hz, and you probably don’t need to go above 15kHz, probably much less if you are extracting rhythmic information.

A quick picture to show you what I mean !

Here are 4 frequencies, 50Hz, 100Hz, 10,000Hz and 10,050Hz.

The reason you can only see 3 is that 10,000Hz and 10,050Hz are sat on top of each other, as they are practically the same frequency. If you wanted to split up the spectrum of an audio signal - and have it be of any use - it would need to be split logarithmically.

Final thought (I promise), rather than splitting the audio in 1024 elements, if you were to split logarithmically into individual musical notes, you can cover 8 octaves (50Hz - 12,800Hz) with just 96 elements.

Hi Socks. Thank you for your very detailed comment. First of all I want to start with why 1024 samples, I don’t know the technical reason behind it, but the sample count must be a power of 2 in order to FFT in unity, maybe everywhere else aswell. It has to be between 64 and 8192 according to the getSpectrumData manual.

This link is just one of them, and I’m using it as an example, but I have read more than 20 stuff like this, about beat detection,signal processing, spectrum analysis etc. I don’t want to sound cocky, obviously I still lack huge amount of technical understanding of the subject, but I believe I have some idea about what I have to do. I just don’t know how to do it.

As I mentioned in my post and kinda feeling like rewriting the stuff, when I get spectrumData, I gives me 1024 small numbers that are changing as the song plays. Considering the stuff I’ve read so far, I can say I know each element in the array, represents a frequency range, I just don’t know what to do with them.

I have talked to one of the audio engineering professors at my school. He told me that if I want to detect a beat, I have to detect transients in the song and how transients differ from the rest of the song, wavewise. So I need to calculate the average energy of the signal, and if there’s sudden energy change/spike at a particular moment, that’s a beat. And I don’t want to do this with every frequency band. I’m not planning to use classical music in my game, Hell, since we are at it let me even give an example. Don’t wanna go too off-topic but,

this song for instance. If I’ve not misunderstood the information that the audio engineering professor gave me, this song (or any song like this with relatively simple beats), is way easier to process for beat detection than, lets say rock songs, where transients aren’t so spiky relative to the rest of the song. I don’t wanna babble too much about irrelevant stuff though.

Anyway, the shortest, simplest way of my question is, I don’t know what to do with the data that I get after using GetSpectrumData, how can I use it to calculate energy differences or use it to pinpoint a frequency range.

I’m not quite sure what you are saying here, in your original post I thought the 1024 referred to 1024 individual ranges (48kHz / Nyquist / 1024 = 23.4Hz . . . ?).

It’s not clear what this has to do with ‘samples’ (a measure of time rather than frequency) ? Maybe I’m confused about what you mean ?

Yeah, that’s what I thought you meant, splitting the spectrum into 1024 parts, although like I say this is a very impractical way of doing things, in the lower register 1 of these numbers (the range) will cover ~4 musical notes, and up at the high end you’ll have dozens of these numbers clustered around 1 note ! You would be much better off splitting the spectrum musically into 8 or 9 octaves, with a range for each note, so less than 100 numbers and - especially at the high end - a much more accurate measure - rather than have dozens or even hundreds of numbers being measured for a single note !

With that example song you wouldn’t need any other information than the tempo, there’d be no real need to analyse the audio, it’s simply 4/4, you’d just time your game events suitably. But I guess you are planning to use lots of different songs.

By samples I meant the array elements, wrong wording on my part. (48kHz / Nyquist / 1024 = 23.4Hz . . . ?) This is the correct calculation I assume, when you get the spectrum data, half your sampling rate / 1024 or something like that. With this song it might be enough have BPM/Tempo but not all songs i want to use are like this. I can actually create 7 subbands to represent the frequency ranges. Like in the link I’ve posted in my previous post, 20-60 for sub bass, which would be subBand[0], 60-250 for bass = subBand[1] etc.Kinda like the audio visualizers in media players, like winamp etc. I still feel like what I want to achieve is relatively simple, but anyway, thanks for the input.

No problem, good luck, let’s us know if you make any progress !