I’m building a tabletop view game with lots of different tiles and I’m looking for efficient ways to render these while keeping draw calls low. I’m using URP and so any solution should work for that and at present I’m leveraging the SRP batches, though it’s not clear if this is the right approach - it does seem the quickest atm.
I’m rendering a circle of tiles around the player’s location.
I’ve considered disabling the game objects in each tile outside the circle and I’ve also tried disabling the MeshRenderer which seems to be faster. I looked at just rendering the tiles with Graphics.RenderMesh() but this breaks batching and I end up with 1000s of draw calls and is slower than disabling the meshrenderers.
I was looking at creating a custom SRP but I could not see a way of filtering to only those GameObjects near to the circle around the player and I’ve also considered dots but I don’t know if dots will be a benefit if the tiles are all different.
I am also wondering if I should just collage all the meshes within the area and push that but I am concerned about bandwidth as I’d like this to run well on Quest HMDs.
Any suggestions of ways to rapidly disable many tiles at once or tackle this tile-based approach would be greatly appreciated.
I have considered combing meshes also which would make the enabling/disabling more viable but I have other reasons for retaining the tiles.
Am open to suggestions, especially for utilising dots or custom SRP.
I really think you are overthinking this. Make a mesh where the quads represent each visible tile, upload the per-tile data to the mesh each frame. Let the engine render it. Or if you are able just use the built-in 2D tilemap system. Optimization shouldn’t be something to consider here since even a twenty year old PC will handle either system just fine as long as you don’t do anything too silly with it.
Yeah I am thinking of doing this tbh but as I said. I am targetting quest. It isn’t a consideration for PC at all as brute force just works.
I can bench pushing a large mesh and see. I should be clear this is a 3D tile system so each tile will be many polygons, ie 50*50 tiles plus maybe 300 polys/tile depending.
Ah yeah I missed the whole Quest part of it. That sounds rough cause I’m pretty sure you want to be aiming for a max of like half a millions verts per frame or less and you’d already be in the 2 million range with the numbers you said above. Is this some kind of terrain system? Normally I think of tilemaps as consisting on just two-poly quad per tile. At that point it really doesn’t even matter how long it takes to upload the data to the tilemap since rendering it will be your bottle neck. Though, I’m just going by what I’ve read in the past. I can’t say from actual experience so it might still be worth a quick benchmark.
EDIT:
Yeah here’s an old post I recalled from a while back. Basically it’s not looking good. It seems like you are going to have to significantly dial back either your tile count or your poly count per tile.
Yeah, I think the average is probably less than 300 polys/tile and I could dial it back, so many tiles could be a dozen triangles.
Essentially it’s a Zelda link, Links Awakening view or think of table top view of a physical game like D&D, and my plan is to have both outdoor regions and indoor, dungeon like sections.
The quest can do a decent number of polys but draw calls are a problem and given I’m looking at maps around 128x128, with any reasonable number of polys per tile I can be quite over budget, plus I don’t want to be sending 500k worth of geometry across the data bus every frame unnecessarily.
Could also cut the visible area back and show a smaller area maybe half the size.
In my experience on PC draw calls are a non-issue using the SRP batcher. SetPass calls are what you should avoid which is fairly easy to do if you use the same shader for almost everything. I can get away with tens of thousands of them and still stay in the triple digit framerates on some pretty mediocre hardware. That being said, its worth checking that out on your actual hardware since you never know. I would definitely do some tests first just to see what the limits are before going any further and committing to a tech stack that may or may not work.
Since these tiles will mostly be fixed in design (since it sounds like you’re effectively emulating old sprite-based tiles’ functionality but with 3D meshes) I think your current idea of per-gameobject tiles might be the best bang for your buck.
You could take a screen-based approach to simplify a bit. Each screen could be a single GameObject whose children are gameobject mesh renderers where each tile is rendered individually. It would be very trivial to disable/enable the entire screen’s worth of tiles just based on where the player is in the world just by deactivating/activating the parent screen object.
If you are really really short on memory for some reason you could have just enough of the previously mentioned Screen-Tile hierarchies to display the current screen plus the neighboring screens in all eight cardinal directions. Then stream model references to the tile renderer objects from a secondary worldmap data structure. As one screen heirarchy moves off of one side of the map you can easily teleport it to the other side and update all of its tiles as needed.
As I understand that is a top down camera game.
If so why are you doing the job of of frustum culling manually, is this circle smaller than screen?
And what is the amount of tiles visible on the screen?
Did you have any frame-time metrics on the target hardware with your prototype or do you just speculating?
Sorry I should have bee clearer, it’s not top-down it’s table-top view, as I said, Zelda-esque but 3D perspective. Think Lego Bricktales type view on Quest. The circle will be variable size within the view of the HMD, because the player will be able to choose to move the view.
Number of tiles, I’d like to be able to go to 50x50. SRP batcher is probably fine but I’d rather it only be considering the 50x50 in view. Which is ~2500 MeshRenderers to consider each frame or at the least, when the centre point moves, though I do have some ideas about being able to toggle just the set.
The PC frame times for the best solutions were around 4ms, I plan to drop this on quest 3 to see how it holds up but given it is 4ms on PC and with ASW, my frame budget is about 10ms, that’s probably costing about 30% of my frame budget. But, it might be a worthy trade off, though many games run at sub 6ms on quest typically - Red Matter 2 is only around 8ms in the areas I’ve benched.
Clearly all of the geometry, textures etc… could be on the GPU and I each frame I should be able to pick which 50x50 I should draw, based on the central point within the grid for the level.
Yeah, i’ve considered a fixed scene view, kind of like Moss I/II which we definitely would work! Yeah the SetPass calls are like 2 in the editor as I plan to have only a handful of shaders tbh.
I’m going to drop the thing on the Quest and see if it chugs as is. It just feels like there should be a better way. I can have all of the tile models and textures etc… on the GPU, I’d kind of like to be able to push the list of 2500 objects almost like IDs of which to draw, surely a lot of this can be backed by some buffer objects on the GPU?
The other thing I am considering is losing the indivdiual tiles and just using the MeshCombiner or similar to group them by sets of maybe 5x5. Then the cost of disabling is cheaper, in the order of ~100 gameobjects/frame max and less frequently.
Though if a just do the square of elements, I could actually jus enable rows and columns, it does mean some overdraw outside of the circle, but that might be okay… hmmmm
Well, frustum culling exist to cull any mesh that outside of screen, especially powerful on games that camera is looking below. And static batcher exist to merge static meshes that close each other. Why not use them?
So majority of the rendering takes 30 percent of budget?
And 4ms comes from what? Are you render thread bound? then optimize drawcalls.
But if your main issue gpu render time then you might be wasting your time.
Frustum culling would include the entire map of 128x128 at many angles, including a fairly standard 45 degrees, so… that’s why I’m not using it. Happy to use SRP batcher, static batcher is not really any good for URP.
4ms comes from the script disabling and enabling MeshRenderers atm.
I do wonder if I could use occlusion culling actually to create a custome mesh each frame that maps the screen space shape to occlude those tiles that would be out side the circular area.
So for fun I threw this on the Quest 3 to see how it did and I can basically render the 50x50 grid which is not many tris and it’s running at ~7ms/frame which is essentially all the frame budget for 90hz. It was peaking at 11ms but that was when I had an actual ligh and more complex shader…
I actually realised that in the latest form of the visibilty algorithm I’m only updating it when it changes, so the update pressure point has changed since I first looked. In actual fact what’s killing in on quest is the culling required, having to try and cull and then SRP batch 2500 objects is painful
You might want to try the new Resident Drawer. Or BatchRendererGroup.
Can you give some details about renderer disabler/enabler script? 4ms looked a lot to me.
4ms to enable/disable renders sound really really high. Like, I’m used to working on PCs but even when I was using an old Athlon II x2 (a cpu I’d bought for $56 in 2009) I didn’t have much trouble enabling and disabling a couple thousand objects every frame. Are you sure the cost is coming from actually enabling/disabling the renderers and not from maybe an inefficient way of looping through the tiles or perhaps you are also enabling/disabling some other script that has some heavy code involved?
So, I’m going to go with 4x4 blocks and then I will be enabling/disabling no more than ~30 GOs each time a row boundary is off.
Without dong the disabling, I am seeing the cost of rendering everything and ‘clipping’ to a circle in the shader, is ~2ms in the HMD, which includes the cost of the circular ‘clip’ in the shader. There’s no culling going on so it’s actually rendering the entire map and this is being ‘clipped’.
This clip is just maniuplating the vertex position, which bizarely ‘just works’ TM, but I clearly don’t want to be rendering every 4x4 section of tiles, because the fragment shader is going to be significantly more expensive than it currently is, but I think a little overdraw, isn’t going to kill performance… I hope…
With some basic geometric shapes it looks like the below - it’s a kind of build it with boxes approach, this is actually rendering ~38 tiles in a circle with just some simple hieght variations.
The triangle count should be low but the editor is saying 44K, but of course this is rendering ALL tiles and that makes sense, given 128x128 is a whopping 16384 tiles.
It looks like this is going to pan out…
UPDATE: As an update, doing some sub-division and rendering everything was almost 900K triangles. Cutting this back to just the tiles that cover (slightly over-cover) the circle, this was ~350K triangles. The former was rendering on Quest 3 at ~10ms for 900k and the later between 5-6ms. It’s a bit high but the triangle counts are extreme and I seem to be rendering around 48x48 tiles, which seems extreme.
I’ll check Resident Drawer & BatchRendererGroup - those sound interesting, but I’m moving away from the mass disabling I think as above suggests I can disable on more coarse level.
I had similar issue recently, rendering tile map, around 15x15 on screen, 200 draw calls and about 25 fps on 7 year old mobile phone. I ve tried many things, slicing, combining mesh, materials, and I have gone dropped down to 50 draw calls, but still 25 fps.
The thing is I have had some complex shader on floor tiles that messed up mobile rendering.
What I would suggest is to disable everything and just render your tiles with plain lit material, disable lights, shaders, animations, everything. And then confirm if your tile grid is the issue.
People sometimes get too fixated on draw calls, I know they matter but performance is more complex topic.
As mentioned above with individual tiles this was putting pressure on the CPU to cull them and the algorithms were costly to ensure I only render the ones I need - because the map could be huge.
This was never about draw calls, though those are definitely important. All these tests are done with a relatively cheap shader, but with an eye on the fact the shader complexity will go up when I add backlighting and shadows, so I need plenty for frame time to account for this.