Problem getting data from an AudioSource.clip - frequency check always returns 5512

I’m trying to build a very rudimentary pitch contour from a voice sample, and as I don’t need much accuracy the easiest way to do this seemed to be to check the fundamental frequency of the recording at some given interval, build an array containing the value of the F0 at each time, then draw that array as a graph to show the general trend of the speaker’s pitch.
I’m trying to accomplish this with two functions: ParseAudioData takes an AudioClip and a float interval, which is the number of samples I want to gather spaced evenly throughout the file,

float interval = 10f;
float[]ParseAudioData(AudioClip myClip){
mySource.clip = myClip; //Give the clip I want to analyze to the AudioSource
float timeInterval = myClip.length/interval; //set timeInterval to the time in seconds I should advance through myClip each time I take a new sample. If interval=4f and myClip.length = 1f, timeInterval=0.25f
float[] frequencyArray =newfloat[(int)interval]; //Create a new array with one index for each sample I want to take
mySource.time =0; //Set the clip to 0 minutes, 0 seconds
float totalInterval =0f; //This will be used to keep track of how far I've advanced through the file
for(int i =0; i < interval; i++){ //For each sample I want to take...
frequencyArray[i]=GetFundamentalFrequency(mySource, totalInterval); //Run GetFundamentalFrequency, which returns the F0 as a float, and store that number in frequencyArray
totalInterval += timeInterval; //Increment totalInterval by the amount of time we want to advance before taking the next sample
}
return frequencyArray;  //Hand a completed array filled with F0s taken from various points in the audio clip to whoever called the function
}



float GetFundamentalFrequency(AudioSource mySource,float sourceTime) //Given an AudioSource, and the time within the file we should get data from, this should return the fundamental frequency of the voice recorded at time sourceTime.
{
float fundamentalFrequency =0.0f;
float[] data =newfloat[8192];
mySource.time = sourceTime;
mySource.Play();
mySource.GetSpectrumData(data,0,FFTWindow.BlackmanHarris);
mySource.Stop();
float s =0.0f;
int i =0;
for(int j =1; j <8192; j++)
{
if( s < data[j])
{
s = data[j];
i = j;
}
}
fundamentalFrequency = i * samplerate /8192;
return fundamentalFrequency;
}

Now this looks right to me, and I expect ParseAudioData to return an array filled with F0s from different points in the audioclip, but every time I run this, no matter what audio file I feed it, every value in my F0 array is set to 5512. I’ve been staring at this for two days, but I can’t see where my mistake is… is AudioSource not meant to be used this way?

After some testing it seems like GetSpectrumData doesn’t work when you use Play and Stop like that.
Try this for yourself. You’ll see a flat line when you try it like this, but if you comment out Play and Stop, you’ll see a spectrum.

// Update is called once per frame
    void Update ()
    {
        audio.timeSamples=(int)Random.Range(1,audio.clip.samples);
      
        float[] data = new float[8192];

        audio.Play ();
        audio.GetSpectrumData (data, 0, FFTWindow.BlackmanHarris);
        audio.Stop ();
      
        for (int i = 0; i<data.Length-1; i++) {                      
                Vector3 sv = new Vector3 (i * -1, data [i]*100, 0);
                Vector3 ev = new Vector3 ((i + 1) * -1, data [i + 1]*100, 0);
               
                Debug.DrawLine (sv, ev, Color.red);
            }
    }

Okay, so changing .Stop to .Pause makes this work, but I think there’s something screwed up in my math: I’m trying to get the F0 for a human voice, which should be somewhere between 20-200 for most cases, and sometimes I get accurate results, but this keeps giving me results between 500-1200, which makes me think it’s returning F1, not F0.

I don’t think the indices in the spectrum directly map to the frequencies. Do you take that into account?

I thought I did- the functionality I’m trying to nail down is drawing a line roughly correspondent to the speaker’s pitch, voice goes up, line goes up, and vice versa. Since I don’t know nearly enough to do a complicated analysis of a human voice, I’ve been working off of a tutorial to grab the F0 as the easiest indicator of pitch level. The actual math I’m doing to get it is based on the bin, not the exact frequency: I’m just iterating through each bin, and when I get the strongest bin, I calculate its frequency by going
f=binIndex * samplerate/bins .

I’m slightly sure that should work, does it look right to you?

Don’t you have to use half the sample rate? I’m not sure either.

frequency = index*(sampleRate/2)/maxIndex

That seems to work much better, thank you! :slight_smile: