I understand that the the function ComputeScreenPos takes in a clip space point and converts it to screen space.
For example, if my screen size is 800x600 and my clip space point is (0.5,0.5), then the ComputeScreenPos should return a screen space position of 400,300. However it doesn’t… at least not directly. Instead you have to divide your x and y by the w vector to get your final screen space position. Why is it that you need to divide by w? What space is the resulting x and y that you would be required to divide by w? Furthermore, why doesn’t ComputeScreenPos automatically divide by w for you? I am not critiquing the implementation. I would just like to know how this works at a low level.
Dividing by w is standard in perspective space. The reason it’s not done for you is that it breaks interpolation. Values and vectors are passed from the vertex shader to the graphics card. They are then interpolated between the vertices before they are passed to the pixel shader. If x, y and w are interpolated and then you divide by w, the result is correct. If you interpolate x/w and y/w, the result is incorrect.
Another way of saying it is that you can’t interpolate in perspective space, only in an orthogonal space. That’s why you get the values in camera space in you have to do the division per pixel.
It’s often referred to as the perspective divide. To create perspective, all 3d coordinates are skewed in 4-dimensional space. Dividing by w brings those positions back to 3-dimensional space.