Hello. I am working on creating an interactive character with stylized real-time lip-sync.
Essentially what I need is the character to move it’s mouth when I speak. I DO NOT WANT TO USE “KeywordRecognizer” as that gets information from windows speech recognition of words. I believe that is over complicating what I am after.
So far, I have been able to make the mouth do basic movement (moving up and down), as unity is able to see how loud the input from the mic is and move the mouth based on that. It looks good, however, I want unity to recognize phonemes or mouth sounds. Does anybody have an idea of how this can be done? No need to ask me how I want to blend the animation, I just need to know how to get unity to recognize phonemes like ee, ay, ir, ew, or ss etc. It also does not need to be spot on either.
The only idea I have (which I have absolutely no idea to implement) is to pre-record myself pronouncing each phoneme separately and then have unity compare my realtime speech with the pre-recorded clips to match which waveform is most similar. The result would then tell the character which mouth shape to make. But as a beginner at coding, I have no idea how to implement that but it was an idea lol.