Hi!
So I’m trying to get Salsa working with Dissonance and Netcode to sync the Salsa lipsync over a network so all players can see the other avatars’ lips move. I’m using Reallusion CC4 characters.
I’ve followed the Salsa-Dissonance setup (Dissonance Voice Chat - SALSA LipSync Suite v2 ). Replacing the Hlapi objects with the Network Object scripts. I’ve also overruled the ServerIsAuthorative on the Animator component on the avatar so I can controller animation on the clients without the server.
The project is fresh, all new downloads. Including the One-Click setups.
I start a server and a client. The server instantiates the avatar, spawns it with my client ID (so it is the owner).
My network animator works as I see the chosen animation on the client mirrored on the server.
Dissonance is properly connected.
Using the flag in the SalsaDissonanceLink, I see in the console that the Salsa Dissonance link is working on the client.
What I do see is that the server keep looking for Dissonance player ID of the client in the method WaitSalsaDissonanceLink. That method apparently only sees it’s local Dissonance ID and not the remote one, so the server never says that Salsa and Dissonance are linked.
Using
Does this have anything to do with the fact that although I am syncing the Animator, Salsa affects the blendshapes directly (not through the Animator) and thus the blendshapes are not synchronized?
Can you please point me in the right direction and help me find how to solve this?
Cheers,
Marco
Unity 2021.3.22
Dissonance 9.0.1
Salsa 2.5.4.125
Windows 11
Reallusion CC4 character
Unity Netcode for Gameobjects 1.2.0
Hi @Tech-Labs , SALSA working with Dissonance does not sync the blendshapes via the net code. It works by attaching to the serialized audio for each avatar and letting the SALSA instance on each avatar process the serialized audio. Something in the configuration is not quite right if SALSA isn’t linking up by finding the player id. I would suggest checking with the Dissonance guys to find out what might be wrong. Also, keep in mind, for the remote avatars, the SALSA configuration needs to be set to external analysis. It gets the data directly from Dissonance, which is what the SalsaDissonanceLink script is for.
Just to let everyone know where my “mistake” was.
After talking (Discord) to the people of Dissonance it turns out that using the NfgoPlayer component only works on Network Object that are marked as IsPlayer. For me it doesn’t really matter whether they are or not, so instead of SpawnWithOwnership I now use SpawnAsPlayer for my avatars. Now the Salsa-Dissonance link works fine!
Thanks!
Greetings. I’m experimenting with Jitter and global timing to finesse lip sync in Unity.
These settings are mostly ok, except the avatar moves lips during silent moments, and looks a bit like a goldfish opening and closing their mouth.
I’ve tried adjusting silence and turning it off/on without effect. I haven’t yet found the source settings to prevent this. Might you suggest please, which settings(s) to look at?
Hello! It might be that your audio has a noise floor that SALSA is picking up on. You can raise the floor using the Dynamics scaled cutoffs section. Raise the left value up to something that is above the floor, but still detects the audio you want. Hope that helps!
Ah, thank you. I tried settings this higher, but to no effect. The audio is received from Azure text to speech.
If I deactivate the Salsa component then re-activate during runtime, the looping lip animation stops.
This almost appears to be a loop of a few phonemes.
I’ll match all settings with another avatar that is closer to defaults to test.
Ah, OK, so you are leveraging a custom audio filter chain insert to access the audio? If this is the case, it sounds like your buffer isn’t being filled with “zeros” during silent periods or when the TTS stops.
Currently I’m using RT-Voice + Azure + SALSA. The text comes from ChatGPT; the avatars have facial expression animation loops driven from a behaviour tree, and I use persistence in Salsa + Lips-only Process Queue.
I set the avatar as Audiosource and use the Speak command from in once the Chat text is returned.
I might be missing a step, like adding an additional component. This process is revealing the “shaping” possible with SALSA to reduce choppiness or timing to suit the character of the model.
OK, so not MS’s Cognitive Services. So your audio is coming back as a complete clip and not streamed. SALSA only analyzes the AudioSource clip while it is playing. So some data has to be pouring through the AudioSource for the lips to be moving. Sometimes when using a microphone, which uses a circular buffer, the “playback head” pointer can get mismatched with (in front of) the “record head” pointer and produce the symptoms you see. But, I cannot think of how listening to a normal AudioSource clip is causing animation in SALSA for silent audio.
Now, considering your animation loops, it is possible you have an animation that affects the “mouth” that SALSA is not controlling and therefore, you are are seeing that animation. When you are seeing the silent-lip-movement, try disabling your animator to see if the movement stops.
Thank you for replying and your advice. I do notice the lips do not move (correct) until the audio from Azure is received and played through that avatar’s audiosource via RT-Voice and the Azure prefab.
After the Azure-provided audio finishes, we have the Goldfish Effect I’ll try disabling animation and see what the effect is.
My other avatars with settings closer (or same) as the one-click Add-ons also use the same Behaviour tree and animation loops, but the also have most of the Process Queue items selected. I’ll compare these.
Follow up: turning off behaviours and animations didn’t affect the mouth blendshape loop.
Turning off the Queue Processor stops the loop; turning QP on again continues the mouth movement.
Disabling Salsa then turning it on again stops the mouth looped motion.
Next test: I’ll reimport a CC4 character and redo the scenario, to see if I mixed presets with CC3 or iClone.
OK, good, now, disable the AudioSource to see if there is something that is popping that you can’t really hear. If disabling the AudioSource makes the fish mouth stop, something is being fed into there that you don’t want. You can also watch the analysis values in SALSA to see if it is getting something (option only available when running):
Additionally.watch the blendshapes in the SMR and see what blendshape(s) is(are) moving. If it is in the QP, you should (maybe) see it cycling in there as well; although that might be going too fast to show up in the list.
These are great tests. Thank you for pointing these out.
Disabling audiosource has no effect on fish mouth.
Analysis value remains at 0, no bar on the display.
I’ll check docs for what the SMR is.
Question: Ideally I’d like to keep expression animation curves around the mouth, like (slightly) smiling, frowning, neutral, while SALSA drives the lip sync phoneme tracking.
The goal is to overlap expression with words, if there is a way to select the balance between them?
For example:
Animation curves own smile left & right, while SALSA owns lip motion, and a slider to blend between the two to balance for them. I find lip motion often doesn’t need to be exact to feel like it’s real.
Note:
This is what Visemes do; I wonder if persistent animation balance can be controlled, and if Visemes influence blendshapes on both sides of the mouth.
SMR = SkinnedMeshRenderer, which is where the blendshapes are interfaced. If the AudioSource is disabled and the fish mouth continues, it must be driven by something else (not SALSA). Only other thing would be EmoteR? I see you don’t have an EmoteR linked to SALSA, but do you have one running and perhaps the emote is on random or cyclic firing. Other option might be a misconfigured viseme that has min as non-zero, which would leave the blendshape enabled at the end of animation, but wouldn’t cycle it. So I’m at a loss.
To answer your question, you can do what you are looking for and sounds like you’re somewhat there. SALSA technically works better with EmoteR since conflicts are handled in the QP hierarchy. But it would work with animations as long as SALSA is able to write to the animation last per frame. This may or may not be possible depending on how animations are driven. SALSA Suite runs in the LateUpdate cycle. So as long as your animation writes in Update, then SALSA can override it when necessary and merge back to it when not in use. They only other way to deal with it, if it isn’t possible to guarantee SALSA writes last, is to not use the same blendshapes in SALSA that you are using in your animation.
I’ve pretty much exhausted ideas at this point. If EmoteR isn’t enabled and there aren’t any misconfigured visemes, it will be necessary for you to send me your project. Preferably as small of a sample of your project as possible that still demonstrates the problem. Send it to our support email address with your Invoice number and a reference to this thread and I’ll be happy to take a look.
I am currently using EmoteR in my Unity project (Unity version: 2021.3.18f, SALSA LipSync Suite version: 2.5.4).
I am encountering an issue related to blend shape values and minShape that I could not find any existing discussion or solution for in the forum.
In my project, I am using the following code to manually trigger an emote:
emoter.ManualEmote(emoteName, ExpressionComponent.ExpressionHandler.RoundTrip, 1f);
The issue arises when I set a minShape value that is not zero. When minShape is set to 0, upon the completion of the emote, the blend shape value correctly returns to 0. However, when I set minShape to a value other than zero, for instance 0.5, I observed that the blend shape value does not return to the minShape value of 0.5 upon the completion of the emote. Instead, it returns to 0.75.Additionally, I tested this behavior with both ‘Cubic Out’ and ‘Linear’ easing settings and observed the same result in both cases.
I am unsure if this is an intended behavior or if there is an underlying issue that I am not aware of. I would be grateful if anyone could provide insight into this behavior and if there is any solution or workaround to ensure the blend shape value returns to the minShape value after an emote finishes playing.
Hello, I am a new user, my models are UMA. Can SALSA be used to simulate talking without any audio files? in my case no syncing is needed, just a randomized movement of lips that feels like talking, all I need to control is the length and also inserting expressions here and there.
Hello @hahasohano , this is not intended behavior and I can reproduce your issue. Typically, the min/start is 0 unless it is merging back to an external influencer (i.e. Animator, higher QueueProcessor priority, etc.). I will have to dig in and see what is causing this. Thanks for pointing it out. I don’t have an ETA at this point because I don’t know what the issue is. But hopefully I can get it into the next bug-fix update.