Sounds like you’ve already done most of what is possible.
The only advice that comes to mind is:
- Make sure graphics jobs are enabled and profile a standalone release build, not a developer build or editor
- Reduce the number of draw calls with instancing/batching, indirect draw calls and culling (e.g. occlusion culling)
- Test both the DX11 and DX12 backends.
- Maybe give the BatchRenderGroup a try (but that’s probably for regular draw calls, not compute)
- If you are GPU bound, you could maybe use async compute in DX12 but sounds like you are CPU bound
Also take a look at this excellent performance optimization guide from Unity: