Sound level detection from mic - problems with 'noises' / 'smoothness of the pitch value

Hi there!

I have an idea for app / game - who knows what it will be :).

I found some tutorials on YT - one, to make microphone input to visualize sound, and another to get some values of sound like spectrum range, loudness level (db values) and pitch value (height of sound).
And I merged these two things to get the pitch value from microphone input - and I get it, but with some issues…

First: _audioSource.clip = Microphone.Start(null, true, 1, AudioSettings.outputSampleRate); the “1” is the record time in seconds. So, I understand, that the voice must be recorded (But really ?) so I set this to 1 s because I couldn’t set less value. Now I have a little reverb / echo effect… Is there any way to make it ‘real’ time ? In the inspector I have ‘loop’ option on only at Audio Source.

Second problem: I have 1:1 - position of ObjectPlayer (Sphere for now) in y axis related to pitch value. When I sing high - it goes high, when low - it goes low. But the pitch value gets also every scratches, noises so the ObjectPlayer jumps, shakes irregulary - in other words, it looks roughly, not what I want :smile:. I want to limit it only for a pure sing sound to get only music notes.

Im looking for the way, how to limit that to achieve full control of movement in easy way.

Here is my code of AudioAnalyser script:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class AudioAnalyser : MonoBehaviour
{
    // Microphone input
    public AudioClip _audioclip;
    public bool _useMicrophone;
    public string _selectedDevice;

    private const int SAMPLE_SIZE = 1024;

    public float rmsValue;
    public float dbValue;
    public float pitchValue;

    private AudioSource _audioSource;
    private float[] samples;
    private float[] spectrum;
    private float sampleRate;


    private void Start()
    {
        _audioSource = GetComponent<AudioSource>();

        // Microphone input
        if (_useMicrophone)
        {
            if (Microphone.devices.Length > 0)
            {
                _selectedDevice = Microphone.devices[0].ToString();
                //_audioSource.clip = Microphone.Start(_selectedDevice, true, 10, AudioSettings.outputSampleRate);
                _audioSource.clip = Microphone.Start(null, true, 1, AudioSettings.outputSampleRate);
            }
            else
            {
                _useMicrophone = false;
            }
        }
        if (!_useMicrophone)
        {
           //_audioSource.clip = _audioclip;
        }

        _audioSource.Play();

        samples = new float[SAMPLE_SIZE];
        spectrum = new float[SAMPLE_SIZE];
        sampleRate = AudioSettings.outputSampleRate;

    }

    private void Update()
    {
        AnalyzeSound();
        Debug.Log(pitchValue);
    }

    private void AnalyzeSound()
    {
        _audioSource.GetOutputData(samples, 0);

        // Get the RMS
        int i = 0;
        float sum = 0;
        for (; i < SAMPLE_SIZE; i++)
        {
            sum = samples[i] * samples[i];
        }
        rmsValue = Mathf.Sqrt(sum / SAMPLE_SIZE);

        // Get the DB Value
        dbValue = 20 * Mathf.Log10(rmsValue / 0.1f); // dbValue = 20 - default

        //Get Sound Spctrum
        _audioSource.GetSpectrumData(spectrum, 0, FFTWindow.BlackmanHarris);

        //Find Pitch ////////////////////////////////////////////////////////////// <-
        float maxV = 0;
        var maxN = 0;
        for (i = 0; i < SAMPLE_SIZE; i++)
        {
            if (!(spectrum[i] > maxV) || !(spectrum[i] > 0.0f))
                continue;
               

            maxV = spectrum[i];
            maxN = i;
           
        }

        float freqN = maxN;
        if (maxN > 0 && maxN < SAMPLE_SIZE - 1)
        {
            var dL = spectrum[maxN - 1] / spectrum[maxN];
            var dR = spectrum[maxN + 1] / spectrum[maxN];
            freqN += 0.5f * (dR * dR - dL * dL);
               

        }
        pitchValue = freqN * (sampleRate ) / SAMPLE_SIZE;


    }
}

Also simply for testing script of Object movement:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class Movement : MonoBehaviour
{
    //public Transform AudioAnalyser;
    public Vector2 ppos;

    void Start()
    {
    }

    void Update()
    {
        GameObject theCamera = GameObject.Find("theCamera");
        AudioAnalyser audioAnalyse = theCamera.GetComponent<AudioAnalyser>();

        ppos = new Vector2(transform.position.x, audioAnalyse.pitchValue/50);
        transform.position = ppos;
    }
}

Anybody have an idea how to fix that :slight_smile: ?

I just want to pure control of the playerObject by singing notes from my voice - without uncontrolled values from pitchvalue.

Here is the video how it works now:

1 Like

Without having tested your code, your 2nd problem might be caused by harmonics. Try a band-pass filter on your voice to cut off frequencies below 100Hz and above ~ 400Hz.

Hi.

How can I implement this filter - is it a hardware device, or just some kind of script?
I limited the notes from around 170 - 1000 HZ (my heighest notes of my voice reach more than 1k HZ - but only on the movement script of my player object to detect music notes, for example for lowest E note:

if (audioAnalyse.pitchValue <= 169.99 && audioAnalyse.pitchValue >= 160) // 164.81 = E3 ////////// lowest note
            {
                transform.position = new Vector2(transform.position.x, -3.5f);
            }

.
And the same code lines (with specific freq ranges) with other 30 notes from E3 till A5
But I assume that this filter will limit the range on the input side? Please tell me more about that.

I need to get possibbly the clearest note from the voice, and cut off all disturbing noises, to get perfect matching of the Player Object height. For now, it jumps often between 2 or more heights of different notes ://.,.

This spectrum analyzer is pretty cool - GitHub - keijiro/unity-audio-spectrum: Provides spectrum data with audio output (Unity, C#)

If you choose the 31-band mode, you can probably see multiple peaks for your voice audio in the editor. That’s the base frequency + harmonics.
In AudioSpectrum.cs, you can set your own frequency bands as a float array. Change those to the frequencies of your 30 notes.

Modify the Update() method like this

    void Update()
    {
        CheckBuffers();

        AudioListener.GetSpectrumData(rawSpectrum, 0, FFTWindow.BlackmanHarris);

        float[] middlefrequencies = middleFrequenciesForBands[(int)bandType];
        var bandwidth = bandwidthForBands[(int)bandType];

        var falldown = fallSpeed * Time.deltaTime;
        var filter = Mathf.Exp(-sensibility * Time.deltaTime);

        int tmpIndex = 0;
        float tmpMax = 0;

        for (var bi = 0; bi < levels.Length; bi++)
        {
            int imin = FrequencyToSpectrumIndex(middlefrequencies[bi] / bandwidth);
            int imax = FrequencyToSpectrumIndex(middlefrequencies[bi] * bandwidth);

            var bandMax = 0.0f;
            for (var fi = imin; fi <= imax; fi++)
            {
                bandMax = Mathf.Max(bandMax, rawSpectrum[fi]);
            }

            levels[bi] = bandMax;
            peakLevels[bi] = Mathf.Max(peakLevels[bi] - falldown, bandMax);
            meanLevels[bi] = bandMax - (bandMax - meanLevels[bi]) * filter;

            if (bandMax > tmpMax)
            {
                tmpMax = bandMax;
                tmpIndex = bi;
            }
        }

        for (var bi = 0; bi < levels.Length; bi++)
        {
            levels[bi] = bi == tmpIndex ? levels[bi] : 0;
        }
    }

in order to isolate the loudest band (levels array). All other bands get muted. That should give you some precision for your controls.

Thank you,
I downloaded it and tested with attached audio clip test music. So it looks interesting in the inspector Audio Spectrum window. I replaced update section with yours, and I see more selective peaks when the music is playing. So, it looks like, its much more precise, but I don’t know how to implement it to my idea… for example - how to get the freq / pitchvalues from this code to compare/match them to object Y position. It’s a little bit complicated for me, I’m not an expert in C#, but I will try to figure out how to make this, or if you want to give me a little help / advice with that, it will be grat!

The code should work for detecting discrete notes.
Try creating your own frequency array for the notes you want to detect.
https://en.wikipedia.org/wiki/Piano_key_frequencies
https://github.com/keijiro/unity-audio-spectrum/blob/master/AudioSpectrum.cs (line 19 ff)

In the modified Update method, tmpIndex is set to the index of the loudest frequency band. That would be the index of your note. You could use tmpIndex for controlling your object position.

Thank you mbaske. I used tmpIndex to control my object (for testing: transform.position.y = tmpIndex), but it gets values from line: 25 - there are 31 frequency values. So it is specific value like: 100, or 125, or 200 etc. I tested it and it catches sometimes from audioclip test music - and my object jumps to these positions but very rarely because there is no tolerance :smile: . I mean - I need to transform these freq notes from numbers to ranges, for example: 100 > into [90-120] to set the object position to 100 (even if the frequency from the microphone input is 90, or 92 or 119).

Hi @Avenged90x I know this post is a year old but I am doing something similar with the same problem with the frequency jumping sporadically the same way your video shows exactly. Not sure if you had any luck fixing it or not?

I have a temporary solution in that when I analyse the samples I only analyse to around the average ranges of the human voice to manually cutoff the higher frequency’s.