Dissecting the projection matrix

2020-10-24 Permalink

Within a rasterization pipeline, once the vertex coordinates are transformed to the camera coordinate frame, a non-linear perspective transformation is applied. That transformation itself is performed by the pipeline implementation.[1] However the vertex coordinates need to be adjusted to match the desired viewport and FOV configuration.

This adjustment is accomplished by multiplying the homogeneous vertex coordinates by the perspective matrix. The later, for a pinhole camera, usually takes the following form:

This post dissects the meaning of the parameters A, B, C and D.

Focal length

A and B determine the aspect ratio and the ‘zoom’ of the camera. In computer graphics, it is common to specify them using field-of-view angles or frustum sizes at the near plane.

Photographers, however, use the focal length and the sensor size to describe these parameters. Given a focal length L and a physical sensor size (w,h), A and B are determined by:

A = 2L w⁻¹
B = 2L h⁻¹

Using the above formulae avoids the unnecessary trigonometric functions or the dependence on the near plane, making it easier to specify. For example, when rendering with a 50mm lens on a full-frame camera one would take A = 2 × 50 / 36 and B = 2 × 30 / 24.

Similarly, to fit an object of size S on the frame (e.g. for dolly-zoom), the focal length L is taken to be the distance to the object, and the sensor is stretched to cover the object:

A = 2L S⁻¹ h w⁻¹
B = 2L S⁻¹

Depth range

C and D are responsible for scaling the depth so that the scene depth values fit in the range of values that the depth buffer can store.

Negative one to one

Legacy OpenGL used [−1, 1] for the device coordinate z. Accordingly the corresponding C and D are:

C = −(f + n) (f − n)⁻¹
D = −2 fn (f − n)⁻¹

n is the near plane and maps to -1, f is the far plane and maps to 1.

Zero to one

Direct3D and Vulkan use the [0, 1] range. The corresponding C and D are:

C = −f (f − n)⁻¹
D = −fn (f − n)⁻¹

This mode can be activated in modern OpenGL with glClipControl(..., GL_ZERO_TO_ONE), and is needed for the next section.

Reverse-Z

With fixed point depth buffers one gets better resolving precision close to the near plane, and the farther the far plane the lower that precision is. This is a significant issue when rendering scenes with both far and up-close objects. E.g. a first-person view showing own hands and far away mountains.

Floating point depth buffers solve this problem: floating point have the same average relative precision for small values as for far values.[2]

However, the value interpolated and stored in the depth buffer is the inverse of the depth. Therefore, to utilize that precision, the far plane needs to be mapped to 0 and the near plane to 1, which is the opposite of the classical way of doing things. This is achieved by setting:

C = n (f − n)⁻¹
D = fn (f − n)⁻¹

Taking f → ∞:

C = 0
D = n

Additionally, since the meaning of near and far z-values is now reversed, the depth-test comparison and the clear value have to be flipped:

glDepthFunc(GL_GREATER); // default is GL_LESS
glClearDepth(0); // default is 1

With floating point depth buffer this allows resolving geometry from near plane all the way to infinity. Moreover, one can choose n as small as necessary, with the limiting factor being the exponent range rather than the precision. With modern hardware there’s rarely any justification to use anything other than this mapping.

Near plane clipping

Enabling depth clamping with glEnable(GL_DEPTH_CLAMP) makes sure that geometry closer than the near plane is still rendered thus eliminating ‘see through’ artifacts when the camera gets too close. Such geometry won’t be depth tested against itself, though it’s rarely an issue.

On Nvidia one can disable clipping and clamping entirely with GL_NV_depth_buffer_float. If done so its best to set D = 1 to maximize the depth exponent range.

No matrices

On superscalar architectures (which modern GPU SMs are), since all the matrices above are sparse, it may be beneficial to pass just the non-zero parameters A-D into the shader, rather than performing a full 4x4 matrix multiplication.^{[benchmarks needed]}

Footnotes

How exactly does OpenGL do perspectively correct linear interpolation?
Depth precision visualized by Nathan Reed.