I guess the camera type is not really important, just the ability to get the x and y size in pixels, perhaps from the renderer component of a gameobject.
i would cast a ray with a certain length through both of the pixels. then you get the endpoints of the rays and can calculate the distance between them. then you can apply the formula linked above to calculate the required scale. then you could place the object in the mid between the two points.
what you should consider:
when the rays hit something you have a problem. maybe you should set the flags to exclude any layers.
when the object is not symmetrical you cant place it in the middle.
when the object should rotate you may want to use the axis aligned bounding box to determine the width.
the object still yould be occluded by stuff.
maybe consider using a second camera only rendering this object on top. this would help with the occlusion thing.