Shaders: What is clip space?

If you’re a shader writer, writing vertex and fragment programs with Unity, you’ll be familiar with this: we use this line a lot in the vertex program:

o.vertex = UnityObjectToClipPos(v.vertex);

In theory I understand it – a transformation of the vertex from its local space finally into the clip space for the camera using the MVP matrix. But what are the conventions of clip space? Is the bottom left of the rendering screen (0, 0) in (x, y). increasing to (1, 1) at the top right corner? And what are the z and w values in clip space, and what range do each of those go to?

To start, clip space is often conflated with NDC (normalized device coordinates) and there is a subtle difference: NDC are formed by dividing the clip space coordinates by w (also known as perspective divide). NDC boundaries are “normalized” and therefore always consistently bound.

The conversion from clip space to NDC happens after the vertex shader is run and is done automatically for you. Whatever you put in your w component will be used to divide your xyz components. You output your clip space position as (x, y, z, w) in the vertex shader and the graphics API converts it to (x/w, y/w, z/w, 1).

Regardless of platform, the NDC for left/right/top/bottom bounds will be the same:
the range of the x-coordinate from [l, r] to [-1, 1],
the y-coordinate from [b, t] to [-1, 1]

Unfortunately, the depth coordinate will differ based on whether you are in an OpenGL-like platform or a Direct3D-like one.

Direct3D-like: The clip space depth goes from 0.0 at the near plane to +1.0 at the far plane. This applies to Direct3D, Metal and consoles.

OpenGL-like: The clip space depth goes from –1.0 at the near plane to +1.0 at the far plane. This applies to OpenGL and OpenGL ES.

Also note: from within the shader you can use the UNITY_NEAR_CLIP_VALUE built-in macro to get the near plane value based on the platform.