VR: Training ML model to detect hand movement based on just velocity and positional data

Hi, I'm implementing an arm swinging locomotion system in Unity using a Quest 2 as the test device. The thing is, it's all written using classical computing rule based conditionals. I was able to get the player to move forward when they swing their arms though constraints on the touch controller's velocity.

The problem is, this method is easy to fool and returns many false positives; whenever the player moves their arms, the camera moves forward.

So I thought this is the perfect project to get started learning to utilize ML. Could you guys please advise me on what I need to learn in order to train a model to recognize hand movement patterns that are associated with walking forward? This way I can hopefully cut down on the false positives.

Thanks.

Like with everthing you should start with very simple basics first, and then try the more complicated stuff.

With your problrm I dont understand what is the ground truth/labels and what should your Brain output?

1 Like

I feel like this is a vision task, for which supervised learning might be appropriate? Supervised learning is much faster than reinforcement learning, but you need to have annotated data. i.e. record a bunch of video, chop your videos into small clips of, I dunno, a second or so, depending on how long the actions last, then annotate those. Then train supervised. You'll likely need at least several thousand annotated video clips, but if you start with a pretrained vision model, you might be able to get away with fewer. I'm not sure of the relatinoship between this task and Unity? I feel like python + pytorch + some existing pretrained vision model might be able to work well for this perhaps?

[quote=“hughperkins”, post:3, topic: 904530]
I feel like this is a vision task, for which supervised learning might be appropriate? Supervised learning is much faster than reinforcement learning, but you need to have annotated data. i.e. record a bunch of video, chop your videos into small clips of, I dunno, a second or so, depending on how long the actions last, then annotate those. Then train supervised. You’ll likely need at least several thousand annotated video clips, but if you start with a pretrained vision model, you might be able to get away with fewer. I’m not sure of the relatinoship between this task and Unity? I feel like python + pytorch + some existing pretrained vision model might be able to work well for this perhaps?
[/quote]
Okay, now I got it what"function"/behavior he wants to optimize for.
Like you said, if he says: this is the kind of moving I want, learn to detect it, supervised learning is the best choice (Positions of arms plus velocities are enough IMO).
But for that Mlagents is not a good choice, but any python script. It should be just enough to make short clips, write them to some file and use it in e.g. Keras, export it as a NNmodel and that is it.

1 Like