Getting the difference between scene and video feed

I’ve been trying to wrap my head around how to do this but I’m at a loss.

I’m going to have a scene in unity playing on a TV, and a real world webcam pointing at the TV. The TV is going to have a small real world figurine on it.

What I want to do is track the figurine by comparing what the scene looks like, to what the webcam sees, and find the difference (which in theory would be the figurine). I would also want to track the position of the real world figurine relative to the scene, in order to interact with the unity scene.

One way I thought of doing this would be to capture still images of what the scene currently looks like with an image of what the webcam currently sees. However I can’t seem to find/figure out a way to find the difference between the 2 images. Also comparing each pixel sounds very performance heavy and I need this as performant as possible. Especially if I’m doing the comparison each frame. :confused:

Does anybody know of a way I can achieve this the above results? Any help is greatly appreciated.

I’ve attached an image that illustrates what I mean.

Also it’s worth mentioning that whatever the “figurine” looks like will be impossible to know before hand. Ie my users should be able to use a thing of chapstick if they wanted.

This is why object tracking solutions like Vuphoria would not work in this case.

This doesn’t sound all that plausible to create from scratch and have it perform in real time, while you’re also dealing with any number of environmental issues. My best guess would be getting really good with developing neural networks to do the analysis.

1 Like

Weird that there were two computer vision posts by different folks so close together.

OpenCV is a library that can greatly help with this, as it is a compendium of image processing geared specifically towards computer vision.

That said, it will NOT solve the above problem out of the box. It just does the low-level grunting in a native performant way, and you can integrate it directly with Unity using this package on the Asset Store:

Look for a video demo of it, you can see what you’re up against. The finest engineers at Google and Microsoft and many other Fortune 500 tech companies collectively spend billions of dollars every year trying to solve the very problem you are asking to solve, including governments looking for face recognition, security companies trying to detect fraud, etc.

It would be considered a Hard Problem™.

Good luck!

1 Like

Thanks for the tip and support, I’ll check out OpenCV.