Grfx Drivers and Computer Stability in Unity

I have two mid sized games that have been severely hampered in ongoing development due to Unity crashes to desktop, mostly silent. The application just disappears with no bug reporter while idling in Edit mode. I have a Power Computing gaming rig with an i7 processor and 32 gigs of RAM and a GTX 1080. Countless hours of investigation shows this happening to Unity games and applications for quite a while as any google search will verify. I have done all the necessary…updated to latest drivers, tried many combinations of settings in control panels, device manager and system settings for display and they will not go away. Drivers are updated from Device Manager and GeForce and any game overlays have been uninstalled. I have done everything plus that the bug QA team has specified.

Some of the symptoms displayed are
-models will get what looks like a vertex or vertices popping which display as a black spike for a frame coming a meter or two off the surface of the mesh.
-screen tearing where the Edit or Play viewport, but not the editor UI will get a white tear across it.

My research into resolving this has led to innumerable short threads with the complaints on reddit and nVidia discussion forums, a few blog posts and similar. Mostly by gamers or if by Unity devs not much info was relayed I could make any kind of decision as to how to stabilize this. I note that not any of the principles tech support folks ever chimed in on any of these threads. It is like they are ignoring it and hoping it goes away on it own. Assassins Creed and something something Tarkov being two large Unity games with innumerable complaints that seem exact replicas of this ongoing issue of silently crashing within second or minutes of opening the application. Many more were encountered with the core complaint being Unity…though of course my research was biased due to the keywords used in said search.

I have noted that many gamers who use AMD grfx hardware do not have these complaints. I am hoping to compile data in this thread that will either lead to a resolution of the nVidia graphics driver issue, assure me that if I get the investor to foot the bill for a high end AMD card that the issue will go away or force the triumvirate of MicroSoft, Unity and nVidia to focus their combined resources to pinpoint the issue and provide a proper and permanent resolution to umpteen problems encounter by Unity dev and end product consumers.

So, please feel free to chime in and provide experiences, stable driver versions or “nope…never experienced this on AMD” or whatever you folks can contribute that may lead to resolution of this multi-year ongoing issue.

Next steps would be:

  1. Check if the problem exists on another computer with same configuration.
  2. Check if GPU is overheating.
  3. Stress-test the system/GPU.
  4. Check if there are error messages in system error log (event viewer).

Also…

Would be nice to see a screenshot of that happening. Additionally would be a good idea to know if this happens to skeletal meshes or all of them.

1 Like

Is the CPU or GPU overclocked?
How hot are the CPU and GPU running when you see the problem?

The GTX 1080 is a popular card. It can run a bit hot, though. Check the temp with CPUID HWMonitor. That tool will show temps for CPU and GPU.

I personally run an Nvidia card in each of my PCs. I have never run into the problems you are describing. I have used Unity with several different Nvidia cards over the years including GTX 780, GTX 1070, GTX 1080ti, and RTX 2080ti.

It is possible your specific card may be having an issue. I doubt there is a wide spread Nvidia specific issue. I don’t know if getting another card would solve your issue or not.

1 Like

It happens to a skeletal mesh and to a motorcycle mesh with no rigging in the Tail Of The Dragon . It also happens with Unity terrain system and trees. It happens in the Cyclotronica game with tunnels of about 16K polys and with particles ands the designer says it won’t load on the PC sometimes before crashing due to some package manager conflict the bug qa team blew him off about… Both of these crashes happen on another PC and not as frequently on a new Mac laptop. System is water-cooled and will happen after the machine has been shut off all night and just restarted. Unless the cards get hot in 5 minutes I do not think it is heat related…but could possibly be. Checking the CPU and GPU with the system tools show nothing incompatible or wrong.

A GPU or CPU can get hot in a few minutes, and cooling being attached does not necessarily mean the chip is being cooled. Basically, you need to get sensor readings to see what’s going on, with air/radiator cooling in the past I frequently ran into situation where thermal paste/pad would dry out and that would result in overheating.

Would be nice to see a picture of the glitch you’re describing on the bike, as long as you are completely sure the bike is not somehow a skinned mesh. Of course, unless there’s an NDA involved. In case of skinned meshes spikes tend to appear in case of busted skin weights, or incorrect transform. Another possibility is having a NaN somewhere in object transform.

However, I do not have a 100% sure idea what it could be.

2 Likes

The system can be booted after a night of sitting off and still exhibit the issue within five to ten minutes of booting and immediately loading Unity. Thanks for the tip on the CPUID HW Monitor. I have been searching for such a tool but got bunches of crap based on my keyword searches. The entire system is water cooled. The CPU board is a GigaByte board. The card looks like it has chrome pipes so I am assuming it is water cooled as well. How do I underclock it? I have not been able to find that info either.

The searches on my problem bring up innumerable problems with Unity games crashing on the GTX 1080 with no driver resolution.

I just want to work. I get on a roll and have all this data in my brain, get slammed over and over with silent crashes one after the other and then forget what the hell I was doing.

I have rebuilt the bike and optimized the mesh and exported as an FBX from C4D R20. Made sure there was no stray points. As I ponder this I believe whether a crash is from a bad shader or NaN error it should not silently crash the application. It should throw a console error and handle the exception. Is there some reason Unity cannot do that?

Hard to get screenshots as it happens in a split second and may or may not crash silently.

Never heard of thermal paste. How the heck would you and where would you apply such?

Just downloaded the CPUID Monitor mentioned above. GPU is 105F and 41C. Fn is currently 0 RPM. Is that correct for a low temp and does it kick on only higher or should I be concerned about that?

NaN would not crash application, but it could produce black spikes you were talking about. Also, encountering NaN does not necessarily produce error aside from artifacts. If NaN is fed into a shader or appears inside a shader, that should not result in exception being thrown.

If capturing screenshots is difficult a video could do. Then you could look thorugh the video for the point where spikes appear. Once again… if it isn’t an “under NDA” stuff and you’re comfortable doing that… OBS Studio can record your desktop and is available for free.

Regarding crash, you’d need to investigate locations where unity writes crash logs in hopes that something is written there, in best case you could get a stack trace with symbols in C# code. In the worst case you’ll get a stack trace into deep part of the engine with no symbols or no trace at all.

Thermal paste is applied between GPU or CPU and their radiator. Depending on your region, Thermal Pad may be more common.
https://en.wikipedia.org/wiki/Thermal_paste
Apparently it can be called thermal gel or heat sink compound. Some brands can dry out in which case they lose (heat) conductivity and CPU temp goes up until it is replaced. However, I believe that brands available in your area would be of higher grade, and chances of it drying out is low. Indication of thermal gel issues used to be a CPU(CPU, not GPU) temp in ballpark of 70 degrees celsius after cold boot while idling. Usually such system would reach 90 degrees under load, and quickly perform safety shutoff. GPU overheating, in my experience, could result in visible artifacts, texture corruption, or system-wide freeze.

It is a good low temp and woudl rule out dried thermal gel. However, I"d try to stress test GPU under load to see if it crashes it. I think @Ryiah could give a good advice on it, as she seems to be more enthusiastic about hardware than I am, but apparently FurMark, “Haven” and “Superposition” are popular choices.

1 Like

I had a gt9800 with just a passive cooler that could overhead and lead to visual glitches. In that case it usually was textures looking super glitched out and then shortly after the game crashed.

Unless you see the watercooling being connected to it, those pipes might just be regular heatpipes that are part of a normal cooling solution.

Yep, furmark is good for a GPU heat stress test as far as I know.

Just a word of caution, some thermal pastes contain electro-conductive components and you need to be careful not to apply it to thick or to the wrong parts. Less is more. If you don’t need it, probably better not to mess with it.

1 Like

Yep. I’ve been picking non-conductive paste, but if it isn’t overheating, best not to touch the paste.

2 Likes

The CPUID had a link to other software that finds drivers. It found a nVidia driver that the system and nVidia control panel could not find yesterday. I did a clean install and watched it remove ShadowPlay and a bunch of GeForce crap that previous installs and uninstalling from the system software supposedly did but apparently did not using their clean install versions. I now have Unity open on a huge 100 terrain scene. Temp is at 50 - 58C/122-136 and the fan has kicked on at 0min - 1162max RPM.

The driver software found a bunch of Intel chipset drivers out of date. Scares the eff out of me to update that stuff.

Thanks for all the tips and advice so far. I really appreciate it folks! I had been a Mac user for 20+ years and only got this honking PC in 2017 from the folks who wanted the Tail Of The Dragon road game made. I do nothing on it except Unity dev and web searches for said dev. I am learning much about the system just from the tips and software pointed out here.

Yolo! I mean… make a system backup before you do that.

A couple years ago I had a good experience with one of those fully automated driver update tools. I just unselected a couple that I didn’t want it to mess with, like the wacom driver, and I let it do its thing and that fixed a recurring but rare BSOD crash issue that came from a component for which I could just not find a driver on my own. And my google-fu is pretty strong, I have no idea why I didn’t find it. The tool was “driverfusion” but afaik it’s no longer actively maintained, so I can’t recommend it anymore.

I believe shadowplay installs with “Geforce experience” and I always uncheck that when I install new nvidia drivers.

1 Like

I have a huge forest scene in the Editor Scene view and still see small black squares popping and random diagonal white tears across the viewport but nowhere else. GPU temp at 48C/118F current and max was 61C/141F. Any values I should report back form the CPUID that may help diagnose this?

You should try recording the artifacts so other people could get a better idea of what’s going on… once again, if NDA allows that.

Otherwise I’d recommend to stress test the GPU with FurMark or anything else just in case.

Is it more usable after driver update now? Did it stop crashing?

Does stuff like this happen in Unreal Engine Editor or Games as well? How about other GPU based 3D software? The temperature is fine and if it was a Hardware issue, I’d expect to see it accross all engines, not just Unity.

1 Like

I had few more crashes earlier. It seems to be stabilized now. I can work in C4D on very large files. The tearing and glitches are limited to viewports only in Unity. Wish I knew more. I see that 3 of the CPU cores of 8 maxed out at 100%. CPU clocks are hovering between 3990 and 4200. Nvidia grfx clocks maxed at Graphics 1898 but are at 139, Memory maxed at 5006 but are idling at 405 now and Video maxed at 1686 but are idling at 544 now. Frame Buffer maxed at 60% but idling at 9%. Built in grfx chip Intel 530 clock maxed at 1101 and idling at 0. Any data that would be pertinent from the CPUID readouts I should share?

I have seen CPU and GPU issues over the years where the thermal paste went bad and stop working effectively. When that happens, you can get an instant over heat regardless of the type of cooling you use. I have even seen some i7-8700k CPUs that had terrible thermal paste issues under the IHS, and those could overheat in seconds even with really nice water cooling setups. De-lidding was the only real fix for that situation.

You can underclock your CPU through the BIOS. Simply reduce the max multiplier for the CPU.

You can underclock your GPU with the MSI Afterburner tool. The Afterburner tool is what many people use to overclock Nvidia cards. It can also be used to underclock the cards.

2 Likes

I think those temps are fine. Usually on Nvidia cards, you want to worry if the temp is over 80C and some people even get away with up to 90C. The Nvidia cards will actually throttle themselves to protect against damage from overheating. If your max is only 61C, then you are not getting into the throttling or shutdown thresholds.

2 Likes

What is de-lidding? Also using the driver app linked from CPUID I found a number of drivers for the intel chipset that were older. Would this be causing the issues I am seeing? The motherboard is a GigaByte branded board. Seems pretty tough…we get power glitches up here being at the end of the powerlines right before national forest. The Mac will stay shut off but the PC will hold on if less than a second or two and not shut down. I also had Oculus software which the SOBs make near impossible to shut down or uninstall. I see a couple of Oculus audio drivers. I noted that many gamers got rid of crashes shutting off VR related processes. BTW…thank again for folks on this thread giving me an education in the guts of the system. Trying to find tools for the PC seemed to just run into malware red flags every time I searched.