Hi guys, i’m tring to understand how to optimize a game at the best that i can for reduce the cpu and gpu temperature and work.
For now i’ve made a game and if i put my charactr in a corner of the map and he can see (and load i think) the entire map i have 2.4M vertices.
Starting from this i decided to do some test for check my temperature and fps on my 2 Pc.
One is a Acer Predator Helio 300 with i5-8300h and a 1060 6gb 24Gb ram (8gb of defaul, 16 added by me and it’s a 16Gb Timetec 2666Mhz), the other is a Acer 5 Pro with a i7-8550U and a Mx150.
From the documentation i understand that the vertices count, uv mapping, texture resolution affect the GPU and the batches, material count affect CPU (tell me if that’s all right).
I did some test adding some trees for increase the vertices count and that are the result.
So, the relation between vertices and fps is not linear right? with 20M of vertices i can develop really high resolution objects and run the game at 60 fps, but if i want to increase a lot the performace and have 144fps i need to reduce a lot the vertices (to 1.6M) and have “bad objects” right?
The solution for increase the performance so is to reduce the batches or the vertices? What increase more the performance and reduce more the temperature?
Why with 1.6M vertices i’ve really high temp on the Predator but normal temperature on the Acer 5 Pro? The 70 fps of difference matter that much?
I know that’s a lot of question but i’m really curious about that so if you can help me i would be very habby and grateful. Thanks you very much
Framerate is a terrible metric for understanding rendering performance, because if you use that as your metric something that is actually linearly increasing in cost will show as a curve on a graph. Instead you want to look at the time the frame takes to render, usually measured in milliseconds, or 1/1000th of a second. You also have multiple things affecting performance here beyond just vertex count. As your data already lists, there’s also the number of batches, which is more related to CPU side performance than GPU (though it does affect it as well). That’s how many individual objects the CPU has to tell the GPU to render. A single 1000 vertex mesh probably takes less time for the GPU to render than it takes the CPU to tell the GPU what next to do meaning there’s potentially a lot of time the GPU is effectively idle. Then there’s the question of what’s actually visible in the rendered view, as having a lot more triangles can mean significant rendering cost increases from over shading and micro-triangles. Two meshes that both have 1M triangles and the same material can end up taking significantly different amounts of time to render depending on how those triangles appear on screen. If all 1M triangles are limited to some 50x50 pixel region of the screen, vs taking up most of the screen, vs mostly off screen can have significant impact on the rendering cost. And also not always in the way you think it would (the 50x50 pixel case is probably the slowest!).
Then there’s heat. This is also a terrible way to understand rendering performance. Two computers even with identical specs on paper may run at completely different temperatures depending on a whole range of factors.
On the hardware side the cooling hardware (fans & radiators) might be different, or even if they’re the same one might have a fan that’s not working properly due to dust, age, or damage. Or the thermal compound on the CPU or GPU could have gotten messed up by at some point in the past getting too hot for too long and drying out making it conduct heat less effectively. But the difference between an Acer 5 Pro and Predator Helio 300 in terms of their cooling is significant. The 5 Pro likely has a single fan and heat pipe handling both the CPU & GPU to cool the combined power draw of the maybe 40 Watts. The Predator Helio 300 has at least two fans, multiple heat pipes, some of which are dedicated to either the CPU or GPU, and is dealing with the combined power draw closer to 125 Watts. That’s a massive divide. Plus both laptops will adjust their fan speeds once reaching a certain temp and power load to try to keep the temperature stable. The fact both end up at roughly similar temperatures for the GPU is a credit to the engineers designing consistent cooling hardware. And that’s just the hardware, there’s lots of software related reasons two seemingly identical computers doing the same task can run at different temperatures. But I’m getting too deep into things.
But it comes down to when comparing the same scene between two systems you can really only determine the performance difference of that scene on those two systems. While lots of benchmarks out there like to discuss the performance difference of some specific rendering feature, vertex count, fill rate, shader complexity, etc., the reality is modern GPUs are way to complex for that to be always meaningful. A GPU that may be “faster” at all of those single data points may still end up being slower, and significantly so, in a real game because the interactions between things and how the GPU’s physical processing is divided up is important.
As a very simplified example, GPUs in the past used to have separate dedicated hardware for processing vertices and rendering pixels. This meant a GPU could render “x vertices per second” and render “y megapixels per second”, and for the most part one didn’t affect the other. At some point GPUs moved to a “unified architecture”, which means the vertices and the pixels used the same parts of the hardware. That meant those “x vertices/ps” and “y mp/ps” stats weren’t really useful metrics anymore since how many of one the GPU had to handle reduced the number of the other. Again, this is a super simplified example that’s skipping over a ton of nuance and specifics, but we continue to see similar results today where GPU A claims some massive lead over GPU B in some impressive sounding metric (TFLOPs is the latest example), but in real games the “slower” GPU runs away with the prize.
So, back to your actual post.
Yes, this is roughly accurate. The mesh data (which includes the vertex positions, and uvs) and textures are uploaded to the GPU from the CPU at some point, and after that the CPU won’t even keep them in local CPU memory anymore. After that it just has a reference number it uses when it tells the GPU it wants to render it. So a 100 vertex mesh is no different than a 1M vertex mesh to the CPU after that.
As mentioned at the start of this, fps is terrible metric. And no, the relation between vertex count and fps is not linear. Though from the limited data you provided, the vertex count to frame time is surprisingly close to something linear. Just be mindful that the framerate you see in the editor profiler & frame rate is really only reliably showing you the CPU cost of a frame. Even Unity’s “rendering” profiling times are the CPU times, not the GPU. The actual GPU time is harder to get and requires specialized GPU profiling tools. Something like FRAPs is showing you how many frames per second are being rendered to the screen, but that doesn’t tell you if 90% of that time is the CPU or the GPU.
Really, the best way to reduce the temperature of either system is going to be to lift it off the desk / table they’re on by some amount so the fans can breath, and maybe sit an ice pack on the top. The next best way would be to reduce the amount of work it has to do. That can be done by limiting the framerate to 30 (if you’ve ever played a mobile and noticed it has a battery saver option, that’s what it’s doing), or reducing the resolution. Or just having less stuff. I mean, obviously. But having LOD options that disable some less important objects, or using LODGroups to switch to lower vertex meshes or completely hide objects when they get too small on screen are alternative ways to handle that. But there’s not really any one perfect answer to this question. The range of PCs out there is too great to ever really be able to perfectly optimize a game across the entire spectrum of options. Micro optimizations in the direction it sounds like you’re wanting to be able to get to only make sense when you have a fixed hardware platform like consoles.
May I ask a corresponding question. Recently I was testing shaders, lets take for example brute force outline post fx shader, which does a lot of texture sampling. So, increasing outline width affects CPU main rather than render thread in stats window. Why is that? How the number of texture lookups in fragment program is related to CPU time?
If you look in Unity’s profiler, you’ll find the top thing on the list of things taking time listed as Gfx.WaitForPresent. That’s the CPU asking “hey, GPU, how are you doing?”
And the GPU responding with:
So the CPU thread sits back and waits a bit until the GPU isn’t busy. This might seem odd since if the GPU is going to be busy for a while that seems like it’s a fine time for the CPU to be doing stuff too to get ready to give stuff to the CPU. The problem with doing that is you don’t know how long before the GPU is done, so you might finish the CPU side game update and then have to sit idle again for a while, and by the time the GPU is ready you’ve added a ton of latency. Plus on some systems having the CPU spin up to do work while the GPU is already obviously maxed out will just slow down the GPU (due to thermal and power limitations) and make it take even longer.
It’s also why it’s important to use explicit GPU profiling tools when judging GPU side rendering performance. Unity won’t always wait, even when the GPU is still busy, so the CPU side time can fluctuate wildly even if the GPU render time is actually quite consistent.