The YOLOv7tiny example with the COCO datatset in HuggingFace works fine.
I tried to use the same example with a YOLOv8n ONNX model that I converted from the official ultralytics site with the recommended ultralytics way (python code below):
model = YOLO('yolov8n-weight/yolov8n.pt')
path = model.export(format="onnx", opset=15)
The model was serialized in Unity with no errors, but when I tried to run it with the yolov7tiny code, in the part of the code where it is drawing the bounding boxes:
I would like to share some information I have found:
The yolov7 onnx in the example contains nms, while the yolov8n we use does not (either object detection or pose), so their output is very different. You can check their onnx model:
Ultralytics provides an API to solve the output problem of Yolov8, and the results are directly converted into result number * boxes and points. ultralytics.utils.ops.non_max_suppression
By the way, torch.Size([3, 15]) → 3 = result number(dynamic), 15 = box left up X + box left up Y + box right down X + box right down Y + box confidence + box object category + 3*pose point (x, y, confidence ) . This model trained with yolov8n-pose and only track 3 points. The output shape of model is torch.Size([1, 14, 8400])
But obviously we can’t use this api in unity, we need to post-process this 1 * 14 * 8400 result ourselves(or 1 * 56 * 8400 for pose example,or 1 * 5 * 8400 for object detection example)。
I read the code in unity/sentis-blaze-face. It adds nms layer to the model, but I’m not sure how this is done and what the meaning of each layer is. RunBlazeFace.cs · unity/sentis-blaze-face at main
I’m very much looking forward to having unity an API similar to ultralytics to solve the nms problem. But I also realize that the results of each model are very different, and sentis as a highly compatible running platform may not do such customization for one model.
So I think there should be more tutorials about layers, for example: what exactly these layers mean (more example plz), why add Lays. Constant, why use Lays. Slice, what are the limitations of Lays. NonMaxSuppression… And we need more post-processing demos of popular pre-trained models.
Yes we need to add a NMS op to make this easier.
The issue is right now our NMS layer needs input to be as constant tensors (to be dynamic) similar to nms — Torchvision 0.16 documentation
What is more our NMS is purely on the cpu and need to be optimized a bit.
If you wish you can re-implement NMS yourself to make this code more streamlined, but we will add it in a future release
Hi we put a new YOLOv8 model on our Hugging Face. It should suit your purposes.
Note, we do a ReduceMax of the class scores before feeding it into the NMS layer. This may result in a marginal drop in accuracy when different objects are overlapping with the benefit of being much faster. But we are looking to improve this in future versions.
There’s usually some already in Unity for displaying the UI frames etc. or you easily create one.. The examples are just supposed to be a starting point. For example, you are not restricted to using the Unity UI system if you want to display bounding boxes some other way.