How can I profile/debug issues between Client and Server worlds in Development builds?

I am running into a really frustrating issue with a game I built for a jam, wherein there is incredibly nasty, constant hitching in the Player build of the game that isn’t present (or is much less obvious) when playing in Editor. My naive diagnosis is that there is a problem with the Server/Client predicted simulation getting out of sync (maybe every time a new interpolated ghost Entity is spawned?) which causes a massive re-simulation every few seconds. However, normal profiling doesn’t reveal any obvious spikes in frame times or GC.Alloc. According to the profiler the game is running incredibly smoothly, but according to my eyeballs it’s clearly not. If it is a problem with Server/Client sync and isn’t technically caused by an obvious optimization mistake, how do I diagnose?

Dev build of the game can be found here:

To repro, simply Start Game → Host Game → Open Lobby → Start Game

The hitching will be very obvious as you move your player around with controller or wasd. Picking up rats and holding the throw button will produce very strange predicted spawn behavior when a rat is thrown during a hitch.

Hi, you can make a development build with profiling support enabled. You can then attach the Unity profiler from the editor to the player build and check if there are any specific systems or code causing the hitching.

Right. As I mentioned in my description of the problem above, I have done exactly that and seen no abnormal frame times, runaway GC.allocs, GPU stalls, or really any spikes of any kind. The problem seems to be in the prediction loop and is not in any way captured by the profiler. The question is: how do I diagnose a network/prediction/Server issue given that the Profiler is not useful here.

  1. What version of Netcode for Entities?
  2. Is this a client connecting to an in-proc local server, or a DGS?
  3. It’s odd that nothing is showing up in the profiler - but does imply the issue is with netcode timings. Can you determine if the hitching is interrupting the entire game loop, or is it more like the passing of time is interrupted?
  4. Hunches/Avenues to explore:
    a. Disable all client and server tick batching (click ProjectSettings > Netcode for Entities / Multiplayer > Create to create a NetCodeConfig asset, and set the following to 1; ClientServerTickRate.MaxSimulationStepBatchSize, ClientTickRate.MaxPredictionStepBatchSizeRepeatedTick, and ClientServerTickRate.MaxSimulationStepBatchSize).
    b. Test in editor with realistic latency and see if this exacerbates the kind of hitching you’re seeing.
    c. Add a client log in Update, which includes the following; Time.deltaTime, NetworkTime.ServerTick, NetworkTime.PredictedTickIndex (added in 1.2.0-pre.4), client NetworkSnapshotAck.SnapshotPacketLoss.ToFixedString() and NetworkSimulatorSettings.Enabled.
    d. Add a NetCodeDebugConfigAuthoring to your gameplay sub-scene(s) and set the mode to Debug, which will then log any netcode spawns, which you may be able to cross reference with hitching. Note: Obviously, logspam can also cause hitches.
    e. Do you have any kind of in-game FPS off-the-shelf profiling tool? The profiler should pick up the same info, but some have deeper metrics (like detailed GPU timings and blog post and other fantastic blog post, to rule out issues there).
    f. Similarly: Including a UI widget (like a progress bar) that is updated in LateUpdate via Time.realtimeSinceStartup can be compared to the gameplay in a video/gif to help diagnose).
    g. Disable jobs (via JobsUtility.JobWorkerCount = 0;).
    h. Connect your editor or dev build to the Netcode for Entities NetDebugger tool and check to see if prediction errors are encountered. You may have some severe non-determinism in your prediction code (typically caused by ghost data (like LocalTransform or Rigidbody components) being written to by non-deterministic data sources (like time), or non-GhostField fields that are accidentally modified at runtime. indeterminism can also be caused by system filtering, stale DGS or client builds etc).

I’m unable to play your demo today, but if it is netcode, it may be visible within the data (logging most NetworkTime singleton struct fields is a good idea, honestly).

  1. I’m on 1.2.4
  2. Dunno what DGS is but I assume it’s an in-proc server! I’m just building and starting the runtime in a vacuum. The way it’s set up currently (game jam code) it will sometimes connect to my machine’s global IP and sometimes to the loopback. The stalls seem to happen either way.
  3. I can observe things like the water system (stolen from Boat Attack) freezing as well, so it looks like a full main thread stall to me, but it doesn’t show up as such in the Profiler. The more obvious way it manifests, though, is a full-on ~1s rollback in positions and physics state of simulated entities, like my player character and the objects they are throwing around. So it’s not just a stall, but rubber-banding that bounces my character backwards and causes weird artifacts in how trajectory is getting computed for non-kinematic rigidbodies.
  4. This is exactly the kind of advice I was looking for. Thanks Niki! I’ll do some digging.

(you said ClientServerTickRate.MaxSimulationStepBatchSize twice - was there another setting on that list I should change?)

1 Like

Could you elaborate more on that last point?

You may have some severe non-determinism in your prediction code (typically caused by ghost data (like LocalTransform or Rigidbody components) being written to by non-deterministic data sources (like time),

Most of the movement logic in the project, including Phil’s Character Controller package uses SystemAPI.Time.DeltaTime to compute a given frame’s movement vector. What else would I use to compute movement updates if not time?

or non-GhostField fields that are accidentally modified at runtime.

I definitely have lots of non-GhostFields that I write to at runtime. What’s the concern with that? My understanding was that it’s totally fine to have different values in these fields as long as it’s not informing downstream calculations on other Ghosted fields?

Ah mb - Dedicated Game Server. Broadly I was asking if this is a realistic socket connection over the public internet (with ping, jitter, PL etc), or if it was entirely in-proc or localhost testing. The fact that it shows up even in localhost is very weird.

This sounds like prediction errors, check out the NetDebugger step (step 4.h) to view prediction error reports.

Oops, copy/paste error: I meant ClientServerTickRate.MaxSimulationStepBatchSize , ClientTickRate.MaxPredictionStepBatchSizeRepeatedTick , and ClientTickRate.MaxPredictionStepBatchSizeFirstTimeTick.

Sorry, ambiguous again - SystemAPI.Time.DeltaTime is the correct one to use here.

You’re correct - I was trying to say exactly that.

OK yeah I think we were on the same page for most of this, but I’m now attempting to use the NetDebugger and am a little perplexed. There doesn’t appear to be any documentation on how the tool works or even what it’s telling me?

Here’s a snapshot from my character moving around the scene with the “show prediction errors” box checked. What in this graph - if anything - represents the prediction errors? Is there information to be gleaned from this? I’m just not quite sure how to interpret what I’m looking at or how to get any information that might inform next steps.

So the above capture is from my game running in Editor, where I don’t see major hitching but do occasionally and inexplicably watch the character simply move way slower than it normally does. The following screenshot is from a development build:

So I discovered I can turn off Live Update and scroll through the report, but I’m still not seeing anything super weird going on? The ramp I have highlighted definitely corresponds to one of the in-game stalls, but I think that’s the jitter line? So I would assume network jitter would be a symptom of the problem but not the source… but again I don’t quite know what I’m looking at here :slight_smile:

So I have a rather unsatisfying conclusion to this. After fully running out of things to investigate, I just downloaded my build on another machine and observed that the stall isn’t present. It seems like the hitching only happens on my dev laptop when it is plugged into my ultrawide monitor (through my usb-c dock). This seems to be a problem with something in my particular version of Unity with this particular medley of Packages, as it’s not present on any other projects in any other configurations, but it doesn’t seem to have anything to do with my code, specifically, and isn’t reproducible under any other hardware configurations. It’s frustrating that none of the tools available to me provide actionable information about what is happening here, but ultimately if it’s localized to my machine specifically when plugged into these other devices, it’s no longer worth trying to fix.