I want to create a facial animation of characters on runtime based on the input audio file. How should I proceed?
Typically you need to find a model (most likely a transformer/time-series model) that takes input a wav file and outputs some blendshapes or skinning deformations.
There is a ongoing thread here: