Is It Possible to Automate VRM Animation Processing with Python and Unity Game Build?

I am working on a project and planning to automate VRM animation processing using Unity as a backend service controlled by Python. The idea is to run Unity’s game build on a server, dynamically load VRM models, apply BVH or FBX animations, and generate an MP4 video, all without using the Unity Editor.

In this setup, Python will launch the Unity game build and send VRM files via HTTP API, WebSocket, or a shared file system. Once Unity receives the file, it will process the VRM using UniVRM, generating Prefabs or other necessary assets dynamically. These assets will then be used for animation processing.

For animation, the system will apply BVH or FBX motion data, which Unity will map to the VRM humanoid rig using its animation system. This should ensure compatibility with standard motion capture data and existing animation libraries.

Since this will run as a full Unity game build and not in headless mode, rendering should be fully supported. The final step is recording the animation and exporting it as an MP4 file, either through Unity Recorder or by exporting frames as PNG images and converting them into a video using FFmpeg.

I would like to know if this is a feasible approach. If there are better alternatives or potential challenges to be aware of, any insights would be helpful.

Generally the answer to “is it possible” is always “yes,” assuming you’re within the constraints of space-time continuum and the hard-and-fast rules of thermodynamics.

However, it all comes down to cost and risk.

Cost has to do with the depth of your wallet (or your time runway), and risk has to do with the technology itself functioning as necessary for your solution.

Of course, all of this all presupposes that there is an actual use for what you describe above. If you’re the end user, GREAT! If you’re not, then as you develop this you probably want to check in with your stakeholders to make sure what you’re making is useful.

I recommend you start on it today and begin mocking and structuring the framework of parts: stub out each step, hard-wire a pre-animated result, skin it and stage it and run it and film it and compress the MP4 and export it.

As you begin to hook up stuff, even stub placeholder stuff, you will quickly begin to get a larger view of the problem space and be able to reason better about cost and risk. This is the actual point of prototyping.

So, like all huge things, the journey of a million miles begins with a single step.

Imphenzia: How Did I Learn To Make Games:

1 Like

This sounds like a X Y problem. it’s all over the place. Usually in this instance you would be better explaining what you actually want to achieve.

Do you mean the actual editor or build? Are you using the editor or not?..

Are you sending from your own server or clients? Because it would just be better to use streamable assets if you’re just loading known VRMS:

You shouldn’t have to batch images into a video you could stream directly to p2p or anything else you want to capture: Unity Render Streaming | Unity Render Streaming | 2.0.2-preview
There’s also: GitHub - vrm-c/UniVRM: UniVRM is a gltf-based VRM format implementation for Unity. English is here https://vrm.dev/en/ . 日本語 はこちら https://vrm.dev/

Overall it sounds like you need to ask a better question, I would start with what exactly you’re trying to do and not what you plan to do.

I would also reccomend tools that render this type of project out of the box, I feel unity is not the right system as it shouldn’t really be acting as your render farm, when I’m sure Blender can do this more easily and headless as well, also blend scripts are all built off python so it’s in your wheelhouse: GitHub - saturday06/VRM-Addon-for-Blender: VRM Importer, Exporter and Utilities for Blender 2.93 to 4.3

Thank you for the response. Final product I am trying to build is video generation based on prompt. So I want to program to select a VRM file and motion file, load them to rendering platform , render video, add audio (synch), then deliver final video to a user. So, for the final product, I want to run this on the server. I checked three.js and Blender for this. I found out three.js runs on client side, so it would be difficult to add any caption or audio after render. I tried Blender API but real time video rendering speed was so slow as it was rendering frame by frame.

I kinda figure out how to do it with Unity editor mode but if I want to run this on a server in the future, the editor mode was not feasible and not practical. For that reason, I am exploring options to render video with a game mode as I want to run this on the server later.

The Unity render streaming seems to be p2p stream. I need to look more into it but doesn’t seem to fit what I planning to build.

Thank for your kind words. That’s why I ask this question. It may sound silly to experts but at least I can figure out what kind of questions I should ask later on.