ghosts sometimes uses old snapshot data

The title is what I think is happening. We see this most noticeable with ghosts teleporting to an older position before teleporting back. The most reproducible trigger is when we add a debug component to a ghost on the server, wait about a second, and then remove it again.

NetDbg looks like this:

The purple line is the snapshot age, which goes up a lot when we remove the component.

My hypothesis here is that the ghost uses an old snapshot when it is moved back to an old chunk. I think I fixed this by clearing the snapshot history on all the entities not used in the chunk (see diff below), but I’m sure there is some nuance here that I am missing so any feedback would be welcome :slight_smile:

diff --git a/Packages/netcode/Runtime/Snapshot/GhostSendSystem.cs b/Packages/netcode/Runtime/Snapshot/GhostSendSystem.cs
index 9e95b0ac8..5b6d5cfc5 100644
--- a/Packages/netcode/Runtime/Snapshot/GhostSendSystem.cs
+++ b/Packages/netcode/Runtime/Snapshot/GhostSendSystem.cs
@@ -1167,6 +1167,19 @@ namespace Unity.NetCode
                         }
                     }

+                    // Clear data from entities which has moved from the chunk
+                    if (chunkSerializationData.TryGetValue(chunk, out var chunkData))
+                    {
+                        for (ent = chunk.Count; ent < chunk.Capacity; ++ent)
+                        {
+                            for (int hp = 0; hp < GhostSystemConstants.SnapshotHistorySize; ++hp)
+                            {
+                                var clearSnapshotEntity = chunkData.GetEntity(snapshotSize, chunk.Capacity, hp);
+                                clearSnapshotEntity[ent] = Entity.Null;
+                            }
+                        }
+                    }
+
                     uint anyChangeMask = 0;
                     int skippedEntityCount = 0;
                     int relevantGhostCount = chunk.Count - serialChunks[pc].startIndex - irrelevantCount;

I would need to investigate this more to figure out all the details. Can you report a bug on this so I get all the details and can investigate?

The change looks expensive but should cover the most common cases. Because it only clears the additional enties when the chunk is sent I would assume there is still a chance it is not run if you change archetype then back quickly enough.

I did try to set up some minimal repro with this, but it wasn’t as easy as I had hoped. I think there is something more here that I am missing. Hopefully I can look some more at it this week, but depends on how much time I can find for it.

I finally got some time to look in to this a bit more. Opened ticket 1324997 for the bug that I managed to reproduce in a simpler setup, but doesn’t seem like quite the same thing that happens. Does seem like it could be related though.

I think I found the problem for both the initial problem and the one reported in 1324997. The first problem reported here is a bit trickier to reproduce, so I haven’t been able to fully verify that one yet, but I’m pretty confident it is fixed now.

It boils down to this comment in GhostSendSystem:
// FIXME: should be using chunk sequence number instead of this hack

The hack is that it uses the archetype to detect if they get a new chunk, so when we get a new chunk which happens to have the same archetype, then we get problems. It seems like the only reason it doesn’t use the sequence number is that the property just isn’t exposed in ArchetypeChunk, so I just added that property and used that instead of the archetype.

2 Likes