How to get rid of baseline overhead of URP compared to builtin renderer?

I’ve created a blank test project in Unity 2020.2. with the URP template and just placed a single sprite and orthographic camera in the scene. I then attempted to go for good performance with some of these settings:

  • Orthographic cam,
  • No post processing
  • No anti aliasing
  • No shadows
  • No lights
  • No depth texture
  • No volume
  • No skybox
  • No HDR
  • Everything else at default

This runs at 33 fps on my target mobile device.

The same scene in a project with the builtin render pipeline is clamped at 60 fps.

Now of course this is only an artificial test, but my real URP project is creeping at 9fps on device and I’m trying to optimize it, but I feel like I’m already out of options and how would I even improve it much if an empty URP scene doesn’t even reach 60 fps?

Am I missing some crucial setup step for mobile projects? I do need 3D and lights and some shadows later, so I can’t switch to the 2D renderer, but I still expected at least the same performance for simple 2D/3D rendering as with the builtin pipeline.

Do you know if you’re CPU or GPU bound? Also what shaders are you using?

URP should be faster for the CPU, due to the SRP Batcher, and being able to batch multiple lights together into a single draw call.

URP’s default “Lit” Shader can be slower on the GPU compared to the non-PBR legacy shaders, as it does a lot more lighting/reflection calculations, and also processes potentially multiple lights and shadows in the same pass.

So you could try switching your shaders to simple Lit or Unlit, to rule out shader complexity being an issue.

In general, profiling is the best way to determine what’s slowing your game down.

If you swap out all your shaders for unlit, then you can determine if shader complexity/fillrate is the bottleneck.
If you swap out all your models for ultra low-poly quads, then you can determine if you’re bound by vertex processing or rasterization.

If rendering quads with an unlit shader makes no difference to performance, then you’re likely CPU bound.
This means you either have too many draw calls, or there are too many individual meshes that need to be culled by the CPU every frame.

You can see how long culling is taking in the profiler. Draw calls should only be a couple of hundred or less for an average mobile. (Specifically, “SetPass calls” is the number you want to look at. Though the SRP batcher makes the overhead of this a lot lower than legacy pipeline)

Finally, if all of these things don’t help, then you might be bottlenecked by something non-rendering related, eg scripting, physics, or some kind of Debug.Log/Error triggering lots of stacktraces every frame.

1 Like

Thanks, all of your tips make sense for optimizing in general. I’m still perplexed about my artificial test project. Here are some more random discoveries:

Again, I have only a default camera with a simple square sprite (4 x 4 px).

  • With builtin pipeline this runs at 60fps on my lowest-end target device
  • With URP this runs at a slightly lower framerate of 54 fps

I then switched to a slightly better device and saw on URP:

  • 60 fps when MSAA was disabled
  • 33 fps when MSAA was 2x

Back to the builtin pipeline:

  • 60 fps MSAA disabled
  • 60 fps MSAA 2x

And URP again with a more complex test setup (more sprites, a single 3D model):

  • 20 fps MSAA disabled
  • 17 fps MSAA 2x

Sure I understand that MSAA costs some performance in general, well except in builtin pipeline there’s enough headroom to not make it count. And then it seems like MSAA reduces the performance by half (from 60 to 30) in an empty scene, but only costs 3 fps when there’s already a lot going on (20 to 17). I do realize that it’s hard to judge, since those 3 vs 30 fps don’t actually have the same scale, but still, even in ms this is a completely different factor. To me it looks as if there’s a certain baseline cost to using MSAA or anything in URP when starting with nothing and as soon as I add more work, it’s becoming more efficient, but still slower than builtin.

1 Like