Plain c# classes executing much more slowly in standalone version than in Unity editor

Just wondering if anyone has any idea of what might be happening.

I've been working on a football game and I have a screen where the user can choose to simulate the week's games. When the user chooses to sim a game, I'm using my normal match engine, but I've disassociated all graphical elements - like a model/view/controller with no view - until the game is over and the results are being reported.

When the game is being simmed, there is nothing happening on screen. 99% of the logic is happening in non-GameObject C# classes. I have no game logic in Update() and related methods, no tweening happening. I'm doing this to get maximum speed and to detach game logic from the usual frames per second type throttling/calculations that go on in Unity.

When I execute in the Editor, I can simulate a football game in about 1 minute or 0.3 seconds per play. The same play can take 20 seconds to display when a player is playing the game out, so that part is working fine.

I was pretty sure that building to a Windows executable and executing standalone would lead to better performance, but the exact opposite happened. Simulating games in my standalone build is taking about 2.5 minutes - almost 3X slower. I've tried 64 bit builds and it has had no effect whatsoever.

Has anyone else had a similar experience with an app that does more background calculations than on-screen animations? Maybe someone's tried to make a chess game or something with a long running AI in Unity and has seen this?

Are you using Mono or IL2CPP scripting backend? If you're using IL2CPP, is compiler configuration player settings by any chance set to Debug?

Everything's pretty much on default settings right now.
Scripting backend is set to Mono.

I'd suggest attaching a profiler and looking there at what's taking longer.

Thanks for helping.
I just experimented with trying a IL2CPP build rather than Mono (no debug flag and set for faster execution )and there was no effect on my simulation time - it's still up at 2.5 mins rather than the 1 min I get in the Editor. I'll follow up after I've checked things with the profiler.

I experimented briefly with the Unity profiler back in the 2019 Unity version and while there was lots of info returned, I was never really able to get any execution statistics for any of my own code. Just curious if I was missing something back then or whether that may have been a very early version of the profiling tools.

I'll take a look.

The profiler in Unity 2019 was fine. If you cannot find your own code there, perhaps it's not your code that's taking long to run? Perhaps it's rendering/engine taking longer in the player for whatever reason - maybe you're rendering at a much higher resolution?

1280x800 everywhere.
Looks like I'll have to do a barrage of tests to try to get to the bottom of this. I'll post findings when I figure it out.

Happy to say that I figured out what was causing the slowdown. I was brute forcing my way through all of the Unity Player settings and finally hit a build that had great performance, and then I worked my way back through all the changes I made to isolate the guilty setting.

It ended up being the
Use DXGI flip model swapchain for D3D11

When you de-select this option, you actually get a warning telling you that the player will fallback to something slower and less efficient, but in my case the opposite happened. My performance jumped up by a factor of 10X!

Simulating a football game in the Unity Editor: 60 seconds
In the standalone Unity Player with DXGI flip model ON (default) : 150 seconds
In the standalone Unity Player with DXGI flip model OFF: 15 seconds

I just did a tiny bit of reading up on this option and it doesn't make a lot of sense. This is a screen drawing option, and as I mentioned earlier the algorithm I was testing has no screen drawing happening at all - it is purely logic. There's no reason for logic performance to be hit so heavily by this option.

That seems bizarre. I'd really be interested in seeing what the profiler shows.

I just did a run in the profiler.
To be honest, it's information overload for me. Normally in the profiler you want to compare frames right? My code doesn't care about frames though - it's hard to know where to start and where to stop looking on the time axis.
I have no idea what in there you would be interested in seeing and I wouldn't know what to look for myself either.

If I got two builds (they're only 30M) over to you, I'm guessing you wouldn't be able to profile those without the source?

Perhaps that’s the issue? How does your code work if it’s not during frames?

You can save off profiler traces into profiler files, that way you don’t have to send over binaries or builds.

[quote=“Tautvydas-Zilys”, post:11, topic: 897916]
Perhaps that’s the
issue? How does your code work if it’s not during frames?.

If I have a player actually playing out the match, yes then I have game objects moving around on screen and those objects are being animated and constrained by frames per second.

If the player just wants to see a quick result for a match, then I have no game objects on screen and the only thing that is happening is underlying simulation logic.

It’s relatively easy to detach yourself from the usual frame driven logic by just not using Physics and not placing any match simulation logic in Update() methods. Screen redraws are still happening of course, but the only thing on screen is a static scoreboard that only gets updated when the match is over.

I’ll try to post the two profiler files a little later.

The zip file contains 2 profile runs for the same operation on two different standalone versions of my prototype. One compiled with DXGI Flip set to ON and the the other set to OFF as indicated by the filenames. There were no other settings different. Profiles were taken with the external profiler outside of the editor.

DXGI Flip = off
takes approximately 15 seconds to run and it recorded 7539 frames.

DXGI Flip = on
takes over two minutes to run and it recorded ~15-16k frames
It's also interesting that the profiler file for the slower/longer run is actually much smaller than the faster run.

In case it helps, my game logic is mostly included in two classes, Match and GameControl, so if you see those names in the traces, those are mine.

I'm also attaching two screenshots, so you can have some context as to what type of prototype this is and what exactly I captured in the runs:

The operation I'm focusing on is when the user selects Sim Week from this schedule screen. The profiler runs contain data from a single game being simmed.

If we are not simulating a game from the schedule and a player is actively coaching a team, I'm tweening simple placeholder representations of the football players on a grid playing field. This is the aspect of the game that gets removed when the user sims and I expect execution to be very, very fast for those simmed runs.

Hope there's something interesting in there.

Are you using a lot of async/await code in your simulation code? Every time you "await" while running on the main thread, it will not continue the execution until the next frame. So if your frame rate is lower for whatever reason, your code will take longer to run. That seems to be the case in the traces.

Do you by any chance have a variable refresh rate monitor?

Yes, I'm using awaits whenever I have a method that could potentially play an animation or ask for user input via dialog. A lot of the awaited code is probably getting bypassed when simming, but it's very possible that there's still quite a bit getting hit.

it looks like my monitor has a 48-75Hz refresh rate.

And that solves the mystery. Turning off flip model disables support for variable refresh rate displays and thus makes your build run at a much higher frame rate. With flip model on, it engages your monitor's variable refresh rate technology and limits the frame rate to 75 FPS (to match the refresh rate). Normally this would be a great thing because it reduces input latency and eliminates tearing. However, your use of await causes your simulation to be delayed by N number of frames (based on the number of the awaits). Since flip model runs at capped frame rate, each frame takes a little bit longer so your awaits get delayed by a significantly larger amount of time and thus your entire simulation is taking longer.

If you limit your framerate to some very low number, for instance:

QualitySettings.vSyncCount = 0;
Application.targetFrameRate = 30;

It will expose the flaw in your simulation code in both cases, including in the editor.

Thanks so much. Never expected to get SO much useful info from this question.
I usually do strategy/board games with fixed or slowly moving cameras, so it's probably safe for me to keep that setting off permanently.
This is actually my first project using async/wait. Coroutines would have the same problem on the Yield. Looks like the solution if I really wanted to get maximum speed would be to write a synchronous version of the engine. I probably don't want to do that just for duplicaton and maintenance reasons.

Disabling flip model swapchain will always result is a worse experience for the player, no matter the game type. So I do not recommend it. Also, it seems that your code only functions correctly if the game runs at hundreds of frames per second: that is both wasteful and not guaranteed to happen on everyone's machines.

Do you have to run the simulation on the main thread? Does it interact with the scene/other objects at all? Perhaps you could offload it to another thread, in which case async/await wouldn't be tied to frame rate?

[quote=“Tautvydas-Zilys”, post:18, topic: 897916]
Also, it seems that your code only functions correctly if the game runs at hundreds of frames per second:

I don’t require hundreds of frames per second myself but It comes down to customer patience and what I can get away with. Players these days are willing to wait 1 minute to have a week’s worth of AI versus AI matches resolved. They are usually not willing to wait 10-15 minutes.

Most sports titles these days have a trivial simulation engine that give you a somewhat acceptable result with only the bare minimum of calculations. It’s fast but the generated statistics won’t match statistics coming from the real engine.

I’m doing experiments to see if I can do my quick sims using the actual game engine.

[quote=“Tautvydas-Zilys”, post:18, topic: 897916]
Do you have to run the simulation on the main thread? Does it interact with the scene/other objects at all? Perhaps you could offload it to another thread, in which case async/await wouldn’t be tied to frame rate?

I’ll take a look at that possibility. It sounds promising.

When a player is playing out the game himself, the class that resolves the match does interact with a lot of other GameObjects but when it is simming, all of those interactions are bypassed and the match gets resolved “silently/invisibly”.