Hello,
I followed the keynote and especially liked the presentation pose estimation presented here: https://www.youtube.com/live/8ZIdejTiXAE?feature=shared&t=2882 Do you know where I can find more information on this project?
Hello,
I followed the keynote and especially liked the presentation pose estimation presented here: https://www.youtube.com/live/8ZIdejTiXAE?feature=shared&t=2882 Do you know where I can find more information on this project?
Hey there, sure thing. Below is a breakdown of how it works, and you can email the indie creator Julian here if you have more questions: julianngthowhing@gmail.com. We’ll be releasing a recording from a talk at Unite soon where we show the full demo and explain the below as well.
About
This project is called “Muscle Vision” from an indie dev named Julian Ng-Thow-Hing from San Francisco, CA. He has been collaborating with Dr. Antoine Falisse at the Neuromuscular Biomechanics lab at Stanford. The lab is focused on applying technology to human ergonomics and movement.
Problems to solve
Julian wanted to solve the problem of how a patient could more accurately perform physical therapy at home. He thought there should be a better way to validate at-home practice so you can follow your therapist’s advice, instead of looking at a traditional printed PDF or watching youtube videos. Not requiring any additional or specialized equipment would be ideal as well.
Solution
This problem of tracking body orientation and muscle activation can’t be solved by traditional code, but it can be solved by a neural network. Extracting pose information and grading a given movement relative to a reference would be very tedious with traditional rule-based programming, as it would require a large number of heuristics.
Instead, Julian worked with the folks at Stanford and used their software “OpenSim” to generate training data for his LSTM neural network model. It can predict over 80 muscle activations on the lower half of the body, based on movements over time.
The app has an Augmented Reality view on iPad, using Unity AR Foundations. The neural network analyzes image to pose from the video frames, extracting the body segment orientations of the user in frame. Then, it passes the orientations into the network and determines which muscles are activated. Finally, it validates the correct movement against the therapy requirements.
Performance & Future
This model is less than 1 mega byte, so there is no problem shipping this in the runtime app. The inference time for the network is about 16 milliseconds, enabling this to be a very smooth experience at 60 FPS on an iPad Pro. His future vision includes adding all of the upper body muscles, and he is eager to improve outcomes in physical therapy by making nuanced health training more accessible.
Thank you for the answer Bill! That’s exciting.
Hi we currently have a face detection model example on our Hugging Face page. We hope to have a pose estimation there too in the coming weeks.
Alternatively, depending on your coding ability, you may like to try and implement the model yourself. There are instructions in our documentation how to convert models into the right format and many models on places like Hugging Face to try.