Hello, I have been trying to implement a Pose Estimation pipeline using the UR3 robot (similar to the Pose Estimation Tutorial). I have been using Perception Package for Data collection and 3D bounding box labeller for storing ground truth. I am trying to use a Pose Estimation Model similar to DOPE ([1809.10790] Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects). In this method, we require the 2D ground-truth locations of the 3D Bounding boxes for training the model. Then, while inference, we can use a PnP algorithm (cv2.solvPnP()) to get the pose of the object. This can be later used for pick and place as shown in tutorial.
(I am a little new to the field to the field of Comp. Vision so please bear with me :))
I have read the documentation for reading the json dataset (…6/manual/Schema/Synthetic_Dataset_Schema.html) where it is mentioned we can find the intrinsic matrix of the camera sensor under captures.sensor–camera_intrinsic. I also checked MathWorks reference mentioned in the same webpage. But there seems to be some discrepancy (I have a doubt) in the JSON captures.
I found some negative value in the matrix (See attached example capture file). Can someone explain this? Is the intrinsic matrix of the camera ?
Also, are the parameters of this matrix not expressed in pixels?
I am planning to use cv2.projectPoints(object_vertex_coordinates, rvec, tvec, camera_intrinsic_matrix,…). So, I can get the rvec, tvec, vertex co-ordinates of the bounding box (from the size) in annotations. So, for camera_intrinsic_matrix do I use the captures.sensor.camera_intrinsic (as mentioned) earlier?
Along with the answers, any resource for understanding more about these topics is also very helpful. Thankyou for your time!