Does anyone know how can I capture an Image (from the camera view) & produce the coordinates of the bounding box for this object on the captured image ?
Capturing image is ok but I am struggeling with the coordinates generation.
If you want a rectangular region of the screen, you should be able to get a conservative bounding box on screen based on an axis-aligned bounding box for the object. You’ll be able to get such an axis-aligned bounding box (Bounds struct) from the associated Collider or Renderer on the target object. You should then be able to take all 8 vertices of the box and use Camera.WorldToScreenPoint to get each vertex’s on the screen (ignore the z component for depth). The bounds on screen would then be the extremities that all 8 transformed points reach, i.e. you collect the minimum and maximum x and y values among all the points, and you get 4 points (minX, minY), (minX, maxY), (maxX, minY), (maxX, maxY) representing the bounds. Depending on how the object is oriented in the camera’s perspective, the resulting area could be quite larger than you would expect, but it would at least contain the object as long as the AABB is correct.
Alternatively, if you wanted a more precise bounding box, with a readable Mesh associated with the object from a MeshCollider or MeshRenderer, you could use the object’s Transform component and use Transform.localToWorldMatrix to convert the mesh’s vertices to world space, then do the same thing as with converting and aggregating the AABB vertices.
The post linked below shows a way to get a matrix for transforming from world space to screen space (if you ignore the second line that inverts the matrix and ends up with screen to world).
Matrix4x4 world2Screen = camera.projectionMatrix * camera.worldToCameraMatrix;
https://discussions.unity.com/t/what-is-the-matrix-equivalent-to-camera-worldtoscreenpoint/149611/2
You might find that using MultiplyPoint on this matrix is faster than using Camera.WorldToScreenPoint. If you need to convert from the object’s local space to world space, you can just multiply world2Screen by transform.localToWorldMatrix once and reuse that matrix directly each time.