Netcode freezing

Occasionally, all clients either stop receiving or processing server updates for a few seconds (not sure which). There’s no apparent lag at the time, all anims etc continue running smoothly. This even occurs on a client on the same machine as the host (connected via IPC). When it occurs, it’s for all clients at the same time.

Any idea why this might be happening? I see no warnings or errors in the console. It occurs in both player builds and the editor. It’s a fairly recent issue so I’m going to try to delve into recent commits, but as there’s no solid repro steps other than “play the game” any pointers would be much appreciated.

If you are in build you need to enable run in background, otherwise that occurs time 0. This is also true for the editor though, if you loose the focus and you didn’t set run in background.

IPC can still cause the issue because the problem is the time that passes in between the update. That probably trigger a timeout and that cause the disconnection.

Run in background is enabled for editor, and afaik isn’t supported on Android (sorry, I didn’t mention that the player builds are on Android).

Is this possible if the server simulation or predicted simulation groups take too long? There’s definitely a stall somewhere sending data, and I’m struggling to figure out why.

The prediction loop can take very long, depending on what you do inside (it can predict up to N ticks in one frame on the client).
On the server though should not, again unless you do something that stall for very long time to compute or do (maybe just a bug somewhere).
The send system can be indeed slow without using Burst, but not to a point to freeze the editor. Do you have a profile capture or logs or can submit something we can inspect ?

Continuing my theory of the issue being due to slow predicted systems on the server, I’ve since run an optimisation pass (and can do more), however, the send/receive freeze still occurs. I find it strange that the rest of the game doesn’t visibly lag - anims run as normal, and even client-owned ghosts can continue to be predicted and move around the map (on the local IPC client too). It is literally that no packets are either sent, received, or processed, for 1-2 seconds, sometimes more. And considering all clients (including local IPC client) experience this at the same time, and that after the ‘freeze’ all clients suddenly catch up and then play as normal, it must surely be server related.

In the worst case, we’ve even had one instance where, when playing solo (always via IPC), the client detected that it lost connection to the server.

There’s no logs when this occurs. It’s literally just like lag you’d expect from a client on a dodgy connection, but for everyone at once.

Burst is enabled in the editor too when this occurs.

I’m going to continue optimising server systems. But if it is because of the server sometimes running slow, surely the local IPC client would see visible FPS lag during the ‘freeze’? And that it isn’t constant but seemingly random during gameplay, for 1+ seconds at a time, I find it strange how it could be as simple as the server running slow.

I really love to see a profile for this. Can you provide one?

Uploaded as bug IN-60339.

To repro:

  • Set default scene (Tools, Scene Autoload, Add Default Preloader)
  • Enter playmode
  • Select “runestones” on the main menu
  • Set bots to 4v4 (for higher cpu load)
  • During gameplay, you’ll occasionally see all other units stop moving for a few seconds at a time

I’ve spent the last few days doing significant optimisations, including disabling all prediction and physics systems for the IPC client, but the “freeze” is still present in both the editor and android builds.

If you find anything interesting, please let me know :slight_smile:

1 Like

Hey! looking at the case. Will be back asap.

1 Like

So good news, we’ve isolated the root cause - a misdesigned ability was causing a new element to be added to a ghost-enabled dynamic buffer every frame. Even just this 1 element per frame was enough to briefly stall the server, and if activated by multiple characters, then stalling it for several seconds.

I understand that modifying dynamic buffers isn’t optimal (all elements have to be reserialised every frame?) but this seems a little extreme. Hopefully you guys can find some optimisations in this area, or at the very least add a warning to the console that the server can’t keep up :slight_smile:

Can you send us a profiler dump of this stall? And how big did this DynamicBuffer become to trigger stalls?

There’s bug report IN-60339 and support ticket 1705106 both open with a build attached.