This is primarily an issue on TBDR architectures where load / stores are expensive. These extra loads / stores are extremely expensive on those architectures, I will give numbers below, but here is an outline of what is happening in the forward renderer:
Outline:
-
Shadow Map Passes
-
Depth Pre-Pass (Not executed with my config)
-
Store & Resolve (Haven’t tested but based on how other passes work, probably a safe assumption)
-
Probably causes unneeded load of depth buffer
-
Opaque Forward Pass
-
Potentially load pre-pass depth
-
Store MSAA
-
Resolve & Store [Unused in subsequent passes]
-
Skybox Forward Pass
-
Load Stored MSAA [Unneeded Load]
-
Store MSAA
-
Resolve & Store [Potentially unneeded]
-
Opaque Capture pass
-
Load resolved from skybox pass
-
Downsample & Store
-
Transparent Forward pass
-
Load Stored MSAA Framebuffers from Skybox Pass [Potentially unneeded load]
-
Store MSAA
-
Store Resolved [Unused]
-
Postprocessing
-
SMAA
-
Store MSAA [Unused]
-
Store Resolved
Timings: [iPhone 8s]
Used Xcode frame capture tools for analysis and timings.
-
Opaque Pass: ~7.6ms
-
Draw call total: ~5.3ms
-
Store overhead: ~2.3ms
-
Skybox pass: ~4.5ms
-
Draw call total: 30us
-
Load / Store overhead: ~4.4ms
-
Opaque capture pass: ~0.5ms
-
Transparent pass: ~3.5ms
-
Draw call total: ~300us
-
Load / Store overhead: ~3.2ms
-
Post process SMAA: ~2.5ms
-
Draw call: ~1.8ms
-
Store unused MSAA overhead?? <.7ms
Fixes:
-
Combine Opaque & Skybox pass
-
Removes store overhead in opaque pass: ~2.3ms
-
Removes load overhead in Skybox pass: ~2.2 - 4.4ms (2.2ms if opaque capture pass)
-
Potentially combine Opaque, Skybox and Transparent
-
Only possible if no opaque capture pass
-
Save an additional ~3.2ms on top combined Opaque & Skybox
-
Post Process SMAA:
-
Don’t store MSAA
-
Savings: < .7ms?
-
It seems like the default is store & resolve after each RenderPass?
-
Should only store based on subsequent pass needs?