Disclaimer - I am still learning, but maybe something in here will help you work it out.
Not sure if it’s the same thing, but do you know how avatar and humanoid animation clips work? They define a standard set of ‘muscles’ that all humanoid characters can share so animation clips can be more portable, even if the bones etc are a bit different across models. In addition there are avatar masks you can apply to a track (or override track) that can say ‘this animation clip is only doing some of the bits, like the legs, or head etc’. Otherwise it seems like Unity assumes the animation clip is going to do everything (and then when it does not set values for the other bones, it goes into weird default positions). I am wondering if this is what you are seeing. If so, you might like to try creating an avatar mask for just the head and assigning that to the layer with the blendshapes. I think that way it knows the clip is only meant to be given control over part of the overall character.
Note; blendshapes (used to control the face) are kind of interesting in their own right as they are not part of the humanoid avatar stuff. But I did have some success with a similar sort of project doing this.
might be int4resting - if not there are a few other videos in the channel there.
But if you got the mapping going, you are one up on me! I could not get the mapping right for my character. But I am successfully using another free tool which is good enough for me… for now.