Onset Detector / Music analyzer

I made an onset detector in C#, with Unity and games in mind.

It tries to detect beats, snares, hats and other peaks in certain frequency ranges. It does a decent job with most genres. It can detect the presence of singing and melodies too. It analyzes an entire song within a couple of seconds. The data can then be used in a game, or something else.

I’m planning to make a game out of this. In the video you can see some of the basic functionality.
Another feature is that it can import MP3 files at runtime, which is important for this kind of game.

I’ll probably release the source for the onset-detector, when it’s all cleaned up. It’s a huge mess now. :slight_smile:

That’s amazing :smile:
Are you going to release the source for this?

Eventually. It’s really bulky and still has a lot of unnecessary parts. I’ll release it when it’s cleaned up.

I’m currently not working on it. I’m working on a game that uses this, so eventually I’ll come to it.

Oh it’s fine :slight_smile: I’m just really interested on how you did it (I have attempted one myself but failed misrably…)

I’m doing a fourier transform on samples from AudioClip.GetData. with the resulting spectrum, I can detect the onsets. More info on onset detection can be found here.

I have already cleaned up and released the MP3 import at runtime scripts by the way.

Oh, if you use this, please give me some credit.

Thanks for that! :smile:
I don’t have Unity installed, so I can’t test the mp3 import code. What output do you get when you use an empty mp3? (Audacity → Generate → Silence)
From what I gathered, the resulting samples should be all zero’s right?

The following was what I used (shouldn’t be much different from C# Pinvoke)
The mp3 I used 1163298–44532–$zero_silenced.zip (464 KB) was single tracked (mono) so I didn’t bother to deal with interleaved values.

inline float short_float(short val)
{
	return val < 0 ? val*(1/32768.0f) : val*(1/32767.0f);
}

mpg123_handle *m = NULL;
int  channels = 0, encoding = 0;
long rate = 0;
int err = MPG123_OK;

err = mpg123_init();		
m = mpg123_new(NULL, &err);
mpg123_open(m, "L:\\zero_silenced.mp3");
mpg123_getformat(m, &rate, &channels, &encoding);

err = mpg123_format_none(m);
err = mpg123_format(m, rate, channels, encoding);

// Get the first 2048 samples
const int TIME = 2048;

// 16-bit integer encoded in bytes, hence x2 size
unsigned char* buffer = new unsigned char[TIME*2];
size_t done = 0;
err = mpg123_read(m, buffer, TIME*2, &done);

float* samples = new float[TIME];
int index = 0;

// Iterate 2 bytes at a time
for (int i = 0; i < done; i += 2)
{		
	unsigned char first = buffer[i];
	unsigned char second = buffer[i + 1];
	short val = (first | (second << 8));
	samples[index++] = short_float(val);
}

However I seem to be getting some weird values :face_with_spiral_eyes:
[0] -3.0517578e-005
[1] 0.00000000
[2] 3.0518509e-005
[3] 0.00000000
[4] 0.00000000
[5] 3.0518509e-005
[6] 0.00000000
[7] 3.0518509e-005
[8] 6.1037019e-005
[9] -6.1035156e-005
[10] 3.0518509e-005
[11] 0.00000000
[12] -6.1035156e-005
[13] 9.1555528e-005
[14] -6.1035156e-005
[15] 0.00000000
[16] 6.1037019e-005
[17] -9.1552734e-005
[18] 0.00000000
[19] 3.0518509e-005
[20] -6.1035156e-005
[21] 3.0518509e-005
[22] 0.00000000
[23] 0.00000000
[24] 0.00000000
[25] -3.0517578e-005
[26] 0.00000000
[27] 0.00000000
[28] -9.1552734e-005
[29] 6.1037019e-005
[30] -6.1035156e-005
[31] 6.1037019e-005
[32] -3.0517578e-005
[33] -6.1035156e-005
[34] 9.1555528e-005
[35] 0.00000000
[36] 3.0518509e-005
[37] 0.00000000
[38] 3.0518509e-005
[39] 0.00000000
[40] 0.00000000
[41] 9.1555528e-005
[42] -6.1035156e-005
[43] 9.1555528e-005
[44] 3.0518509e-005
[45] 6.1037019e-005
[46] -3.0517578e-005
[47] 0.00000000
[48] 3.0518509e-005
[49] -3.0517578e-005
[50] 6.1037019e-005
[51] -6.1035156e-005
[52] 3.0518509e-005
[53] 0.00000000
[54] -9.1552734e-005
[55] 9.1555528e-005
[56] -0.00012207031
[57] 0.00000000
[58] 0.00000000
[59] -9.1552734e-005
[60] 6.1037019e-005
[61] -9.1552734e-005
[62] 0.00000000
[63] -3.0517578e-005
[64] 3.0518509e-005
[65] 6.1037019e-005
[66] -3.0517578e-005
[67] 6.1037019e-005
[68] 0.00000000
[69] 3.0518509e-005
[70] 0.00000000
[71] 3.0518509e-005
[72] 6.1037019e-005
[73] 0.00000000
[74] 9.1555528e-005
[75] -6.1035156e-005
[76] 6.1037019e-005
[77] 3.0518509e-005
[78] 0.00000000
[79] 3.0518509e-005
[80] 0.00000000
[81] -3.0517578e-005
[82] 0.00000000
[83] -3.0517578e-005
[84] 0.00000000
[85] -3.0517578e-005
[86] 0.00000000
[87] 0.00000000
[88] 0.00000000
[89] -3.0517578e-005
[90] 0.00000000
[91] 0.00000000
[92] 3.0518509e-005
[93] -6.1035156e-005
[94] 0.00000000
[95] -3.0517578e-005
[96] 3.0518509e-005
[97] 6.1037019e-005
[98] -3.0517578e-005
[99] 0.00000000

I get the same kind of values. I’m not sure why, but I think it’s nothing to worry about.

They’re pretty small values, like 0.00005. It might be round-off error, or maybe compression artifacts.

I don’t want to rush you (not that I could). But I just had to add my voice to those that will be very happy the moment you release the source for that :slight_smile:

I am just beginning my research into the topic and having a working example in Unity would be awesome.

A question since you linked to the badlogic tutorial:
Did you basically just follow along that tutorial to produce the same in C# or did you use it as a base and continued from there? I guess my question is if you consider the method used in the tutorial good enough for basic functionality, even if you went further than that.

I used the basic principles mentioned in that tutorial. I spent most time finding a decent FFT implementation that gave good results. This one seems to give the best results.

Currently I’m working on the game that is going to use this. I’m going to have to clean up the code for the onset detector, to be able to implement it properly. Right now it has a lot of debugging code and ambiguous and redundant parts. That’s mainly because of a lot of experimenting. I don’t think that’s going to help anyone to understand it.

I’m going at it soon. I’m a bit of a procrastinator though.

“I’m going at it soon. I’m a bit of a procrastinator though.”

1287381--58996--$5uJljLT.gif

I just began overhauling it. I’m rebuilding it from scratch. The old detector did all of the analyzing and detecting beforehand, which could take up to 20 seconds, for songs that were over 10 minutes long. Even longer on older computers.

Right now I’m making it so it will do everything on the fly. The actual analysis can be given a head start, so the beat and onset information will be available before they happen, just like in the old version, which is very important if you want to use it in a game.

Hi KoningStoma,

Any news on this topic? I would really appreciate it to be able to see your source code for this…
Looking forward to hearing from you… :smile:

I haven’t been here in a while.

This thing still is a bit messy. A lot of it’s functionality and properties are interwoven into the prototyping I’m working on. Once I get my concept pinned down, I’ll re-do it, so it will be usable in a wide variety of games.

Hello KoningStoma,
Has there been any progression with the project? I’ve been trying to accomplish onset detection, but unsurprisingly its not accurate at all.

Hello KoningStoma,
actually in my free time im working in a game about music, and im trying to use your code to draw cubes into array, but i cant make that the cubes move in the exactly moment of the beat, could you please help me with this ? what methods need for this ?

thx.

Hi Elysian.Zhen,
I have one quistion. What is the Array that return these values ?
thx.

What exactly do you use to get raw PCM data from an mp3?

I’ve tried to use your code (thinking that AudioClip.GetData will yield me raw PCM), but it only filled the array with 0s. I used an array of size 1024 as a parameter.

haya, any recent news for this? - i know this was back from 2015 but im extremely curious since im tackling something almost exactly like this