An object detection example?

Hey there. Is there any object detection example with Sentis and the use of a WebCam or the phone camera? I believe this is a request that many of us have, especially when using Sentis in an AR app

1 Like

Weā€™re working on a pure object detection sample, but in the meantime you can checkout the depth-estimation AR demo we released

All with code and video tutorial :wink:

We have new model examples on Hugging Face . YOLO is a good example for object detection and Blaze Face has a good example using the camera.

@PaulBUnity thatā€™s very cool man and helps a lot, thank you!

A couple of notes on the BlazeFace example:

  1. After a while, thereā€™s always a second tracking box created on the top left of the correct bounding box (see pic)

  2. To see the videoName variable, you must set it as public in the script


Thanks again, would love to see more examples like that. Especially if you manage to add ARFoundation option in the available camera options.

Hi, immFX. Thanks for your feedback. You can make the _scoreThreshold higher to be less sensitive. If that still doesnā€™t work. Can you share a link to the video(s) that have the problem or even some still images (without the boxes) and we can see if its a problem with the code or just the model getting it wrong. Thanks. :slightly_smiling_face:

Thanks for the prompt response.

Nope, it is not the score threshold value to blame and not a matter of the specific video or image (tried webcam as well and still get the second bounding box on the upper left). I believe that it is not the model detecting falsely, but some problem in the code which is duplicating the boxes.


If you donā€™t see it in your end, then maybe a third opinion might clear things up. I will try to take a better look at the code when I find time and maybe provide better feedback.

1 Like

I see the problem. There was a missing factor of two from the second set of offsets (as it looks at two grids one with square twice as big). This would cause the larger faces to have a wrong offset. Iā€™ll do a quick fix. It should fix the problem. If not Iā€™ll take a longer look.

I updated the script. You will see the changes on line 190-191.

1 Like

Works like a charm now! Thanks.

1 Like

Thatā€™s great! Glad to have helped.:slightly_smiling_face:

1 Like

Hello @PaulBUnity ,
Looks like the OP has been resolved. But please see my issue below and kindly assist me in fixing this. Main aim is to create the bounding box and display the label of the object with live camera feed of webcam or android camera.
But I am having trouble with the inference and the generation of the bounding boxes.
I have taken the examples of DepthEstimation/Assets/Scripts/InferenceWebcam.cs
(sentis-samples/DepthEstimationSample/Assets/Scripts/InferenceWebcam.cs at main Ā· Unity-Technologies/sentis-samples Ā· GitHub) and Yolov8 : https://huggingface.co/unity/sentis-yolotinyv7/blob/main/RunYOLO.cs . Since DepthEstimation example shows the usage of camera usage (webcam) and RunYoLo.cs shows the usage of bounding box. So I thought of merging both but removing the video player instance from the yolo as I want the live camera.
SO i have now these issues after merging the code :
(1) The main camera doesnā€™t comes in the main GameObject rather in the inspector window and that too very slow.
(2) I donā€™t get any bounding box , after attaching the debugger with the Unity the code seems to skip at the create bounding Box.

Below is my short code snippet :

void Start()
    {
        Application.targetFrameRate = 60;
        Screen.orientation = ScreenOrientation.LandscapeLeft;
        
        //Parse neural net labels
        labels = labelsAsset.text.Split('\n');

        //Load model
        model = ModelLoader.Load(Application.streamingAssetsPath +"/"+ modelName);

       // targetRT = new RenderTexture(imageWidth, imageHeight, 0);

        //Create image to display video
        displayLocation = displayImage.transform;

        //Create engine to run model
        engine = WorkerFactory.CreateWorker(backend, model);

        SetupInput();
    }
    void SetupInput()
    {

#if false
        video = gameObject.AddComponent<VideoPlayer>();
        video.renderMode = VideoRenderMode.APIOnly;
        video.source = VideoSource.Url;
        video.url = Application.streamingAssetsPath + "/" + videoName;
        video.isLooping = true;
        video.Play();
#else
        var devices = WebCamTexture.devices;
        if (devices.Length == 0)
        {
            Debug.Log("No camera detected!");
            return;
        }

        var deviceName = devices[0].name;
        var camTexture = new WebCamTexture(deviceName, imageWidth, imageHeight, 30);
        
        camTexture.Play();
        displayImage.texture = camTexture;

#endif
    }

    private void Update()
    {
        ExecuteML();

        if (Input.GetKeyDown(KeyCode.Escape))
        {
            Application.Quit();
        }
    }

    public void ExecuteML()
    {
        ClearAnnotations();

        System.Console.WriteLine("debug 1");

        if (displayImage.texture is WebCamTexture camTexture)
        {
            using var input = TextureConverter.ToTensor(camTexture, imageWidth, imageHeight, 3);
            engine.Execute(input);

            //Read output tensors
            var output = engine.PeekOutput() as TensorFloat;
            output.MakeReadable();

            float displayWidth = displayImage.rectTransform.rect.width;
            float displayHeight = displayImage.rectTransform.rect.height;

            float scaleX = displayWidth / imageWidth;
            float scaleY = displayHeight / imageHeight;



            //Draw the bounding boxes
            for (int n = 0; n < output.shape[0]; n++)
            {
                var box = new BoundingBox
                {
                    centerX = ((output[n, 1] + output[n, 3]) * scaleX - displayWidth) / 2,
                    centerY = ((output[n, 2] + output[n, 4]) * scaleY - displayHeight) / 2,
                    width = (output[n, 3] - output[n, 1]) * scaleX,
                    height = (output[n, 4] - output[n, 2]) * scaleY,
                    label = labels[(int)output[n, 5]],
                    confidence = Mathf.FloorToInt(output[n, 6] * 100 + 0.5f)
                };
            }
        }
    }

Thanks and Regards,
S

@PaulBUnity any update? Please let me know how I can proceed

ā€˜ā€™ā€™

using System.Collections.Generic;
using Unity.Sentis;
using UnityEngine;
using UnityEngine.UI;
using Lays = Unity.Sentis.Layers;

public class RunYOLO8n : MonoBehaviour
{
const string modelName = ā€œyolov8n.sentisā€;
public TextAsset labelsAsset;
public RawImage displayImage;
public Sprite boxTexture;
public Font font;

const BackendType backend = BackendType.GPUCompute;

private Transform displayLocation;
private Model model;
private IWorker engine;
private string[] labels;
private RenderTexture targetRT;
private WebCamTexture webcamTexture; // Added WebCamTexture variable

private const int imageWidth = 640;
private const int imageHeight = 640;
private const int numClasses = 80;

[SerializeField, Range(0, 1)] float iouThreshold = 0.5f;
[SerializeField, Range(0, 1)] float scoreThreshold = 0.5f;
int maxOutputBoxes = 64;

//For using tensor operators:
Ops ops;

private List<GameObject> boxPool = new List<GameObject>();

//bounding box data
public struct BoundingBox
{
    public float centerX;
    public float centerY;
    public float width;
    public float height;
    public string label;
}

private void Start()
{
    Application.targetFrameRate = 60;
    Screen.orientation = ScreenOrientation.LandscapeLeft;

    ops = WorkerFactory.CreateOps(backend, null);

    labels = labelsAsset.text.Split('\n');

    LoadModel();

    targetRT = new RenderTexture(imageWidth, imageHeight, 0);
    displayLocation = displayImage.transform;

    engine = WorkerFactory.CreateWorker(backend, model);

    SetupInput();
}

void LoadModel()
{
    //Load model
    model = ModelLoader.Load(Application.streamingAssetsPath + "/" + modelName);

    //The classes are also stored here in JSON format:
    Debug.Log($"Class names: \n{model.Metadata["names"]}");

    //We need to add some layers to choose the best boxes with the NMSLayer

    //Set constants
    model.AddConstant(new Lays.Constant("0", new int[] { 0 }));
    model.AddConstant(new Lays.Constant("1", new int[] { 1 }));
    model.AddConstant(new Lays.Constant("4", new int[] { 4 }));


    model.AddConstant(new Lays.Constant("classes_plus_4", new int[] { numClasses + 4 }));
    model.AddConstant(new Lays.Constant("maxOutputBoxes", new int[] { maxOutputBoxes }));
    model.AddConstant(new Lays.Constant("iouThreshold", new float[] { iouThreshold }));
    model.AddConstant(new Lays.Constant("scoreThreshold", new float[] { scoreThreshold }));

    //Add layers
    model.AddLayer(new Lays.Slice("boxCoords0", "output0", "0", "4", "1"));
    model.AddLayer(new Lays.Transpose("boxCoords", "boxCoords0", new int[] { 0, 2, 1 }));
    model.AddLayer(new Lays.Slice("scores0", "output0", "4", "classes_plus_4", "1"));
    model.AddLayer(new Lays.ReduceMax("scores", new[] { "scores0", "1" }));
    model.AddLayer(new Lays.ArgMax("classIDs", "scores0", 1));

    model.AddLayer(new Lays.NonMaxSuppression("NMS", "boxCoords", "scores",
        "maxOutputBoxes", "iouThreshold", "scoreThreshold",
        centerPointBox: Lays.CenterPointBox.Center
    ));

    model.outputs.Clear();
    model.AddOutput("boxCoords");
    model.AddOutput("classIDs");
    model.AddOutput("NMS");
}

void SetupInput()
{
    // Start webcam
    webcamTexture = new WebCamTexture();
    webcamTexture.Play();
}

private void Update()
{
    ExecuteML();

    if (Input.GetKeyDown(KeyCode.Escape))
    {
        Application.Quit();
    }
}

public void ExecuteML()
{
    ClearAnnotations();

    // Check if webcam texture is available
    if (webcamTexture != null && webcamTexture.isPlaying && webcamTexture.width > 0 && webcamTexture.height > 0)
    {
        // Process webcam texture
        Graphics.Blit(webcamTexture, targetRT);
        displayImage.texture = targetRT;
    }
    else return;

    using var input = TextureConverter.ToTensor(targetRT, imageWidth, imageHeight, 3);
    engine.Execute(input);

    var boxCoords = engine.PeekOutput("boxCoords") as TensorFloat;
    var NMS = engine.PeekOutput("NMS") as TensorInt;
    var classIDs = engine.PeekOutput("classIDs") as TensorInt;

    using var boxIDs = ops.Slice(NMS, new int[] { 2 }, new int[] { 3 }, new int[] { 1 }, new int[] { 1 });
    using var boxIDsFlat = boxIDs.ShallowReshape(new TensorShape(boxIDs.shape.length)) as TensorInt;
    using var output = ops.Gather(boxCoords, boxIDsFlat, 1);
    using var labelIDs = ops.Gather(classIDs, boxIDsFlat, 2);

    output.MakeReadable();
    labelIDs.MakeReadable();

    float displayWidth = displayImage.rectTransform.rect.width;
    float displayHeight = displayImage.rectTransform.rect.height;

    float scaleX = displayWidth / imageWidth;
    float scaleY = displayHeight / imageHeight;

    //Draw the bounding boxes
    for (int n = 0; n < output.shape[1]; n++)
    {
        var box = new BoundingBox
        {
            centerX = output[0, n, 0] * scaleX - displayWidth / 2,
            centerY = output[0, n, 1] * scaleY - displayHeight / 2,
            width = output[0, n, 2] * scaleX,
            height = output[0, n, 3] * scaleY,
            label = labels[labelIDs[0, 0, n]],
        };
        DrawBox(box, n);
    }
}
public void DrawBox(BoundingBox box, int id)
{
    //Create the bounding box graphic or get from pool
    GameObject panel;
    if (id < boxPool.Count)
    {
        panel = boxPool[id];
        panel.SetActive(true);
    }
    else
    {
        panel = CreateNewBox(Color.yellow);
    }
    //Set box position
    panel.transform.localPosition = new Vector3(box.centerX, -box.centerY);

    //Set box size
    RectTransform rt = panel.GetComponent<RectTransform>();
    rt.sizeDelta = new Vector2(box.width, box.height);

    //Set label text
    var label = panel.GetComponentInChildren<Text>();
    label.text = box.label;
}

public GameObject CreateNewBox(Color color)
{
    //Create the box and set image

    var panel = new GameObject("ObjectBox");
    panel.AddComponent<CanvasRenderer>();
    Image img = panel.AddComponent<Image>();
    img.color = color;
    img.sprite = boxTexture;
    img.type = Image.Type.Sliced;
    panel.transform.SetParent(displayLocation, false);

    //Create the label

    var text = new GameObject("ObjectLabel");
    text.AddComponent<CanvasRenderer>();
    text.transform.SetParent(panel.transform, false);
    Text txt = text.AddComponent<Text>();
    txt.font = font;
    txt.color = color;
    txt.fontSize = 40;
    txt.horizontalOverflow = HorizontalWrapMode.Overflow;

    RectTransform rt2 = text.GetComponent<RectTransform>();
    rt2.offsetMin = new Vector2(20, rt2.offsetMin.y);
    rt2.offsetMax = new Vector2(0, rt2.offsetMax.y);
    rt2.offsetMin = new Vector2(rt2.offsetMin.x, 0);
    rt2.offsetMax = new Vector2(rt2.offsetMax.x, 30);
    rt2.anchorMin = new Vector2(0, 0);
    rt2.anchorMax = new Vector2(1, 1);

    boxPool.Add(panel);
    return panel;
}

public void ClearAnnotations()
{
    foreach (var box in boxPool)
    {
        box.SetActive(false);
    }
}

private void OnDestroy()
{
    engine?.Dispose();
    ops?.Dispose();
}
// Other methods remain unchanged

}
ā€˜ā€™ā€™

1 Like