[WIP] Cheshire Lip-Sync

Huzzah! After quite a bit of work, the Cheshire Unity plug-in is available for download on the Unity Asset Store. It is quite rough, but it’s in a usable state. I will be fielding questions about the component here and in a thread I plan on creating in the Asset Store board.

You can find documentation about Cheshire on the Mad Wonder website. There are no message boards there yet, there just isn’t nearly enough content or interest to warrant them.

I would appreciate whatever feedback the user-base here on the Unity boards could provide. I am very new at this, and everything is extremely rough and seat-of-my-pants. There’s still a lot of documentation I have to do.

[UPDATE: 03/13/2015]
After a bit of a hiatus, I’ve actually produced some new visuals. Here’s a model I’m working on to serve as the new “mascot” and example for the plug-in.

It’s a kitty! I felt this would be an appropriate choice given the name of the plug-in. Also, I like cats.

I would be interested in this. One question. What requirements will a mesh need to use it.

Does this use regular unity blendshapes? And how much are you intending to ask for it? It seems i’ll be pouring money into this subject since i’d like something a lot sooner than the end of may but i’d also like something that uses visemes like it seems yours does, i’m assuming you’re using audio analysis. Ahh well, looks good, if it turns out well i might pick it up, cause why not hey

At the moment I don’t have any plans to charge for it. The phoneme timing is being calculated using an open-source command-line program that was released by Annosoft back in 2005. It’s free to use that program (open source), and last year I wrote a basic GUI front-end for it that makes it a bit more user-friendly.

The current requirement is that the model in question have 9 valid shape keys. That’s pretty much it. When you assign the script to a model, it automatically detects if the model has a Skinned Mesh Renderer and whether or not there are 9 shape keys. If those criteria aren’t met, the “Create Animation” button is inactivated. Under the “Mouth Shapes” section you just select which shape key you want paired up with which standard phoneme. When you click the “Create Animation” button, it pops up a little editor window where you can choose the text data file and Audio Clip that you want to use for your animation. The script then takes the timing data from the data file, and creates a Mecanim-standard Unity animation file. The animation file is just like any other standard Unity animation, and can be altered in the standard Unity animation window.

That’s pretty cool, what’s left for you to do? Does it need any battle testing? hint hint haha I’m using daz models and the genesis system onwards has a pretty strong set of of visemes, well, that’s what they seem to be labelled as, a quick wiki reveals a viseme can be the visual appearance of several phonemes but i guess the point is i’d find such a thing very useful, so i’ll watch out for this in future.

Great, does it synch with text only input too ?

My game does not have voice for every text shown for example

Unfortunately, no. This is not a text-to-speech kind of thing.

However, if you have the patience, you could do the voice work yourself, and then just input the timing data without using any audio. I am allowing for that scenario. The audio playback is handled as an animation event at the beginning of the animation tied to a function in the script. If you don’t provide an audio clip during the creation process, the animation will be created without that event, and no audio clip will be required. I threw that in just in case anyone wanted their characters to “mouth” the words without actually playing any audio.

This would still require you to record some audio and extract the timings from it, but the audio would never have to be imported into Unity. Once you had the text file with the timing data, you could forget about the original audio if you don’t need it for your project.

That would be perfect for me, really, i’d like some audio for the voice work but i didn’t really want proper human voices, initially i was thinking of the typical animal crossing style sounds or something like simlish, but i dont think myself and some pals would mind at all if we could record the voice acting, process it with your tool then process the audio so it sounds a great deal stranger. Mind, I imagine this might work with the voice just being kind of gibberish anyways? Either way i love the idea of the voices being able to be used this way then process them for interesting voice effects, gibberish with vocoders but with convincing lip sync is a fun sounding start to me. I’d really like to have a good shot at this when it’s available cheers! I’ve got to deal with facial expressions too but there’s another nice free tool i could adapt for this with unity

wow cool

Yes, actually, it probably would work with gibberish. The command-line program that extracts the timings operates on a set of basic vocals sounds. As long as the gibberish features some of those sounds, it should still work. It works better with a text transcription, but it doesn’t require that transcription to be proper English.

Before the weekend I was able to iron out a major bug that I had run across. Unfortunately, I was busy over the weekend so I wasn’t able to get much work finished. (re-caulking my shower)

I’m getting a lot closer to being ready to go.

Unity Lip Sync Demo

Here’s a quickie Unity Web-Player showing the finished results of animations created using this tool. All of these animations are unaltered. I didn’t go back to clean them up after the fact. (though you can via Unity’s standard animation window) It will hopefully provide an idea as to what I’m shooting for.

All of these animations took only a minute or two to create. It took longer to record and process the audio to my satisfaction than it did to get the animations running in Unity. For anyone who’s curious, the spoken text is the first paragraph of H. P. Lovecraft’s “Polaris.” It’s better to stick with the public domain for demo purposes. Star Wars quotes would have been more fun, but you never know when the lawyers might be looking. Besides, more complicated sentence structure makes for a better stress-test.

This looks great, and very professional. Hopefully there will be a video tutorial on how to set your model up for this.

Any thoughts on adding Eyebrow, Cheek and Eyelid movement for added expression? Would help with some of the stiff characters I still see in modern video games.

Best of luck!

I will touch on it briefly, and may indeed have a basic video tutorial for Blender. (the 3D program I’m using) The actual 3D model creation isn’t the focus of this plug-in. Also, setting up models for this thing is a fairly straight-forward process. All you need to do is create about 9 blend shapes in your 3D program of course. That’s pretty much it. The script detects the given names of the blend shapes from the imported 3D model, and lets you match them up with the various mouth sounds. So there are no specific names you have to use. Set-up is extremely basic. I wanted things to be as simple for the end-user as possible.

The script just uses 9 basic mouth blend shapes. It isn’t designed to add any other animations, just the mouth movements. But since the end-product of the script is a Unity animation file, it is possible to open the animation in the animation editor afterwards, and do whatever you please. If you want to have more than 9 blend shapes on your model for more complex facial animations, that’s up to you. You can create the mouth anim file, and then go in and add whatever further facial animation that you want. I would personally recommend some blinking, smiling, frowning, and eyebrow blend shapes. Additional blend shapes can be combined with the mouth animations to create more nuanced performances, they just won’t be generated automatically.

After a little research, I decided that XML would not be the way to go. The parsing for XML in C# is frankly a bit of a pain. It just seems like it would be more trouble than it is worth, and overkill for the purposes of this project. I’ll be sticking to basic CSV files. I still have a bit more testing, but I’m going to be working on the tutorials and documentation this week.

It’s awesome, but the face in the demo is like one of those story teller toys haha (:

Yes, I can’t pre-program different expressions into the face. That will have to be left up to the end-users. The nice thing is that they can easily add their own additional blend shapes to put a bit more “life” into the 3D models. (smiles, blinking, etc…)

I’ve been pre-occupied with personal matters recently, so I haven’t made as much progress as I would like. Also, I’ve reached the point where I’m working on documentation, which is far more tedious than the fun of problem-solving. But I am still working on this project, and trying to package it up for an Asset Store release.

Quick update. I’m still at it. I’ve finished up about 50% of the documentation. Last night I went back to my example 3D model and optimized the UV mapping. Today I’m going to try to re-paint the basic skin for it. I’m not going to get too fancy, it’s just there for an example.

I still have to go through the submission guidelines and produce some of the graphics to go along with the submission. But I’m in the home stretch now.

Whooo hooo! My very first Unity Asset Store component! Cheshire, the Unity lip-sync component, has made its debut on the Unity Asset Store.

It’s free, so give it a download if you’re interested and take it for a spin. Documentation can be found at the Mad Wonder website, as well as the Windows application that goes along with it.

Post any questions, complaints, or requests for more documentation here on the Unity boards.

I just got it off the asset store today and everything looks good until I try to actually create the animation. I get IndexOutOfRangeException: Array index is out of range.

And it takes me to every line like this: processCurves[_targetCat.GetAdjustedBlend(freshSymbols*)].AddKey(freshFrame);*

I’m not a programmer so I’m not sure what to do about it. Overall everything else looks wonderful! I attached the example animation to my character and it worked great!

freshSymbols is an array that stores the various phonemes extracted from the text file for processing. It’s supposed to be referenced in the code like this…

freshSymbols[i]

I’m not sure what use-case would result in the line of code you’ve posted. I checked through it and couldn’t find any point where the array access was not being used. Can you give me more details, and can you post the text file you’re trying to process? I suspect there might be some conflict there that might be causing the plug-in to hang.

Never mind, I just found out why you didn’t bother to put that little snippet. The BB code for this message board automatically interprets it as “itallic.” Very annoying. Anyway, I would still like to see the text file you’re trying to turn into an animation. It’s possible that it’s feeding in a symbol that the plug-in doesn’t recognize. Also, is there a specific number of times it repeats the stated error? That information could help me narrow down where the bug is occurring.

My mistake, the lines it takes me to is:
processCurves[_targetCat.GetAdjustedBlend(freshSymbols)].AddKey(freshFrame);

and if I comment the first one out then it just takes me to the next one that has that is the same as the first. And only 1 error every time I press Create Animation (the one after you plug in the audio and txt files).

I’m using the example audio and .txt files. Watched the video posted in the 2nd review and did it the same way as them but I just keep getting the error IndexOutOfRangeException: Array index is out of range.