Hey there. Is there any object detection example with Sentis and the use of a WebCam or the phone camera? I believe this is a request that many of us have, especially when using Sentis in an AR app
Weāre working on a pure object detection sample, but in the meantime you can checkout the depth-estimation AR demo we released
All with code and video tutorial
We have new model examples on Hugging Face . YOLO is a good example for object detection and Blaze Face has a good example using the camera.
@PaulBUnity thatās very cool man and helps a lot, thank you!
A couple of notes on the BlazeFace example:
-
After a while, thereās always a second tracking box created on the top left of the correct bounding box (see pic)
-
To see the videoName variable, you must set it as public in the script
Thanks again, would love to see more examples like that. Especially if you manage to add ARFoundation option in the available camera options.
Hi, immFX. Thanks for your feedback. You can make the _scoreThreshold higher to be less sensitive. If that still doesnāt work. Can you share a link to the video(s) that have the problem or even some still images (without the boxes) and we can see if its a problem with the code or just the model getting it wrong. Thanks.
Thanks for the prompt response.
Nope, it is not the score threshold value to blame and not a matter of the specific video or image (tried webcam as well and still get the second bounding box on the upper left). I believe that it is not the model detecting falsely, but some problem in the code which is duplicating the boxes.
If you donāt see it in your end, then maybe a third opinion might clear things up. I will try to take a better look at the code when I find time and maybe provide better feedback.
I see the problem. There was a missing factor of two from the second set of offsets (as it looks at two grids one with square twice as big). This would cause the larger faces to have a wrong offset. Iāll do a quick fix. It should fix the problem. If not Iāll take a longer look.
I updated the script. You will see the changes on line 190-191.
Works like a charm now! Thanks.
Thatās great! Glad to have helped.
Hello @PaulBUnity ,
Looks like the OP has been resolved. But please see my issue below and kindly assist me in fixing this. Main aim is to create the bounding box and display the label of the object with live camera feed of webcam or android camera.
But I am having trouble with the inference and the generation of the bounding boxes.
I have taken the examples of DepthEstimation/Assets/Scripts/InferenceWebcam.cs
(sentis-samples/DepthEstimationSample/Assets/Scripts/InferenceWebcam.cs at main Ā· Unity-Technologies/sentis-samples Ā· GitHub) and Yolov8 : https://huggingface.co/unity/sentis-yolotinyv7/blob/main/RunYOLO.cs . Since DepthEstimation example shows the usage of camera usage (webcam) and RunYoLo.cs shows the usage of bounding box. So I thought of merging both but removing the video player instance from the yolo as I want the live camera.
SO i have now these issues after merging the code :
(1) The main camera doesnāt comes in the main GameObject rather in the inspector window and that too very slow.
(2) I donāt get any bounding box , after attaching the debugger with the Unity the code seems to skip at the create bounding Box.
Below is my short code snippet :
void Start()
{
Application.targetFrameRate = 60;
Screen.orientation = ScreenOrientation.LandscapeLeft;
//Parse neural net labels
labels = labelsAsset.text.Split('\n');
//Load model
model = ModelLoader.Load(Application.streamingAssetsPath +"/"+ modelName);
// targetRT = new RenderTexture(imageWidth, imageHeight, 0);
//Create image to display video
displayLocation = displayImage.transform;
//Create engine to run model
engine = WorkerFactory.CreateWorker(backend, model);
SetupInput();
}
void SetupInput()
{
#if false
video = gameObject.AddComponent<VideoPlayer>();
video.renderMode = VideoRenderMode.APIOnly;
video.source = VideoSource.Url;
video.url = Application.streamingAssetsPath + "/" + videoName;
video.isLooping = true;
video.Play();
#else
var devices = WebCamTexture.devices;
if (devices.Length == 0)
{
Debug.Log("No camera detected!");
return;
}
var deviceName = devices[0].name;
var camTexture = new WebCamTexture(deviceName, imageWidth, imageHeight, 30);
camTexture.Play();
displayImage.texture = camTexture;
#endif
}
private void Update()
{
ExecuteML();
if (Input.GetKeyDown(KeyCode.Escape))
{
Application.Quit();
}
}
public void ExecuteML()
{
ClearAnnotations();
System.Console.WriteLine("debug 1");
if (displayImage.texture is WebCamTexture camTexture)
{
using var input = TextureConverter.ToTensor(camTexture, imageWidth, imageHeight, 3);
engine.Execute(input);
//Read output tensors
var output = engine.PeekOutput() as TensorFloat;
output.MakeReadable();
float displayWidth = displayImage.rectTransform.rect.width;
float displayHeight = displayImage.rectTransform.rect.height;
float scaleX = displayWidth / imageWidth;
float scaleY = displayHeight / imageHeight;
//Draw the bounding boxes
for (int n = 0; n < output.shape[0]; n++)
{
var box = new BoundingBox
{
centerX = ((output[n, 1] + output[n, 3]) * scaleX - displayWidth) / 2,
centerY = ((output[n, 2] + output[n, 4]) * scaleY - displayHeight) / 2,
width = (output[n, 3] - output[n, 1]) * scaleX,
height = (output[n, 4] - output[n, 2]) * scaleY,
label = labels[(int)output[n, 5]],
confidence = Mathf.FloorToInt(output[n, 6] * 100 + 0.5f)
};
}
}
}
Thanks and Regards,
S
@PaulBUnity any update? Please let me know how I can proceed
āāā
using System.Collections.Generic;
using Unity.Sentis;
using UnityEngine;
using UnityEngine.UI;
using Lays = Unity.Sentis.Layers;
public class RunYOLO8n : MonoBehaviour
{
const string modelName = āyolov8n.sentisā;
public TextAsset labelsAsset;
public RawImage displayImage;
public Sprite boxTexture;
public Font font;
const BackendType backend = BackendType.GPUCompute;
private Transform displayLocation;
private Model model;
private IWorker engine;
private string[] labels;
private RenderTexture targetRT;
private WebCamTexture webcamTexture; // Added WebCamTexture variable
private const int imageWidth = 640;
private const int imageHeight = 640;
private const int numClasses = 80;
[SerializeField, Range(0, 1)] float iouThreshold = 0.5f;
[SerializeField, Range(0, 1)] float scoreThreshold = 0.5f;
int maxOutputBoxes = 64;
//For using tensor operators:
Ops ops;
private List<GameObject> boxPool = new List<GameObject>();
//bounding box data
public struct BoundingBox
{
public float centerX;
public float centerY;
public float width;
public float height;
public string label;
}
private void Start()
{
Application.targetFrameRate = 60;
Screen.orientation = ScreenOrientation.LandscapeLeft;
ops = WorkerFactory.CreateOps(backend, null);
labels = labelsAsset.text.Split('\n');
LoadModel();
targetRT = new RenderTexture(imageWidth, imageHeight, 0);
displayLocation = displayImage.transform;
engine = WorkerFactory.CreateWorker(backend, model);
SetupInput();
}
void LoadModel()
{
//Load model
model = ModelLoader.Load(Application.streamingAssetsPath + "/" + modelName);
//The classes are also stored here in JSON format:
Debug.Log($"Class names: \n{model.Metadata["names"]}");
//We need to add some layers to choose the best boxes with the NMSLayer
//Set constants
model.AddConstant(new Lays.Constant("0", new int[] { 0 }));
model.AddConstant(new Lays.Constant("1", new int[] { 1 }));
model.AddConstant(new Lays.Constant("4", new int[] { 4 }));
model.AddConstant(new Lays.Constant("classes_plus_4", new int[] { numClasses + 4 }));
model.AddConstant(new Lays.Constant("maxOutputBoxes", new int[] { maxOutputBoxes }));
model.AddConstant(new Lays.Constant("iouThreshold", new float[] { iouThreshold }));
model.AddConstant(new Lays.Constant("scoreThreshold", new float[] { scoreThreshold }));
//Add layers
model.AddLayer(new Lays.Slice("boxCoords0", "output0", "0", "4", "1"));
model.AddLayer(new Lays.Transpose("boxCoords", "boxCoords0", new int[] { 0, 2, 1 }));
model.AddLayer(new Lays.Slice("scores0", "output0", "4", "classes_plus_4", "1"));
model.AddLayer(new Lays.ReduceMax("scores", new[] { "scores0", "1" }));
model.AddLayer(new Lays.ArgMax("classIDs", "scores0", 1));
model.AddLayer(new Lays.NonMaxSuppression("NMS", "boxCoords", "scores",
"maxOutputBoxes", "iouThreshold", "scoreThreshold",
centerPointBox: Lays.CenterPointBox.Center
));
model.outputs.Clear();
model.AddOutput("boxCoords");
model.AddOutput("classIDs");
model.AddOutput("NMS");
}
void SetupInput()
{
// Start webcam
webcamTexture = new WebCamTexture();
webcamTexture.Play();
}
private void Update()
{
ExecuteML();
if (Input.GetKeyDown(KeyCode.Escape))
{
Application.Quit();
}
}
public void ExecuteML()
{
ClearAnnotations();
// Check if webcam texture is available
if (webcamTexture != null && webcamTexture.isPlaying && webcamTexture.width > 0 && webcamTexture.height > 0)
{
// Process webcam texture
Graphics.Blit(webcamTexture, targetRT);
displayImage.texture = targetRT;
}
else return;
using var input = TextureConverter.ToTensor(targetRT, imageWidth, imageHeight, 3);
engine.Execute(input);
var boxCoords = engine.PeekOutput("boxCoords") as TensorFloat;
var NMS = engine.PeekOutput("NMS") as TensorInt;
var classIDs = engine.PeekOutput("classIDs") as TensorInt;
using var boxIDs = ops.Slice(NMS, new int[] { 2 }, new int[] { 3 }, new int[] { 1 }, new int[] { 1 });
using var boxIDsFlat = boxIDs.ShallowReshape(new TensorShape(boxIDs.shape.length)) as TensorInt;
using var output = ops.Gather(boxCoords, boxIDsFlat, 1);
using var labelIDs = ops.Gather(classIDs, boxIDsFlat, 2);
output.MakeReadable();
labelIDs.MakeReadable();
float displayWidth = displayImage.rectTransform.rect.width;
float displayHeight = displayImage.rectTransform.rect.height;
float scaleX = displayWidth / imageWidth;
float scaleY = displayHeight / imageHeight;
//Draw the bounding boxes
for (int n = 0; n < output.shape[1]; n++)
{
var box = new BoundingBox
{
centerX = output[0, n, 0] * scaleX - displayWidth / 2,
centerY = output[0, n, 1] * scaleY - displayHeight / 2,
width = output[0, n, 2] * scaleX,
height = output[0, n, 3] * scaleY,
label = labels[labelIDs[0, 0, n]],
};
DrawBox(box, n);
}
}
public void DrawBox(BoundingBox box, int id)
{
//Create the bounding box graphic or get from pool
GameObject panel;
if (id < boxPool.Count)
{
panel = boxPool[id];
panel.SetActive(true);
}
else
{
panel = CreateNewBox(Color.yellow);
}
//Set box position
panel.transform.localPosition = new Vector3(box.centerX, -box.centerY);
//Set box size
RectTransform rt = panel.GetComponent<RectTransform>();
rt.sizeDelta = new Vector2(box.width, box.height);
//Set label text
var label = panel.GetComponentInChildren<Text>();
label.text = box.label;
}
public GameObject CreateNewBox(Color color)
{
//Create the box and set image
var panel = new GameObject("ObjectBox");
panel.AddComponent<CanvasRenderer>();
Image img = panel.AddComponent<Image>();
img.color = color;
img.sprite = boxTexture;
img.type = Image.Type.Sliced;
panel.transform.SetParent(displayLocation, false);
//Create the label
var text = new GameObject("ObjectLabel");
text.AddComponent<CanvasRenderer>();
text.transform.SetParent(panel.transform, false);
Text txt = text.AddComponent<Text>();
txt.font = font;
txt.color = color;
txt.fontSize = 40;
txt.horizontalOverflow = HorizontalWrapMode.Overflow;
RectTransform rt2 = text.GetComponent<RectTransform>();
rt2.offsetMin = new Vector2(20, rt2.offsetMin.y);
rt2.offsetMax = new Vector2(0, rt2.offsetMax.y);
rt2.offsetMin = new Vector2(rt2.offsetMin.x, 0);
rt2.offsetMax = new Vector2(rt2.offsetMax.x, 30);
rt2.anchorMin = new Vector2(0, 0);
rt2.anchorMax = new Vector2(1, 1);
boxPool.Add(panel);
return panel;
}
public void ClearAnnotations()
{
foreach (var box in boxPool)
{
box.SetActive(false);
}
}
private void OnDestroy()
{
engine?.Dispose();
ops?.Dispose();
}
// Other methods remain unchanged
}
āāā