I’ve been experimenting with networking settings trying get latency as low as possible. I’ve run into a wall though and I’m really not sure what’s going on or how to figure out what’s going on or if there’s anything I can do about it!
I put together a little test project that just sends rpc’s back and forth.
I output the time it takes messages to get from client to host and back to the client.
Running the two builds on the same computer I get something like 16-33ms.
I ping myself and I get 0ms of course. I output NetworkClient.GetRTT() and I get 0ms of course.
So where is that extra time being spent and is there anything at all I can do about it?
Things I’ve tried:
Upgrading from Unity 5.5 to 5.6 (makes it worse)
Setting Max Delay to 0
Setting Min Update Timeout to 1ms
AckDelay
SendDelay
NetworkConnection.FlushChannels()
Sending NetworkMessages
Sending Command / RPC
Sending NetworkWriter
SendBytes
Changing Reactor Model
Unreliable channel
sendInterval = 0
GetNetworkTimestamp()
Time.realTimeSinceStartup
StopWatch
Manually calling UNetStaticUpdate via reflection
Nothing seems to make any difference at all. I’m completely out of ideas. If anyone wants to recreate the issue I attached a sample project.
I’d really appreciate any insight into this. 30ms doesn’t make much of a difference when there’s no latency and only one hop but when you’re dealing with owner->relay->host->relay->other client->relay->host->relay->owner for two non-host players to interact, every ms counts.
Hmm… I’m going to guess that considering the tests you’ve done, the cpu time could be generating your 16-33ms. For example, I don’t use RPCs, I use lower-level stuff such as the NetworkClient/NetworkServerSimple.SendWriter methods to send data over the network because I know that the only overhead is the construction of a NetworkWriter object on the sending side, and a NetworkReader on the receiving end. I mentioned this overhead to @aabramychev as it could be reduced by re-using cached instances, hopefully this is something that gets improved in patch 1.
Personally, when I test locally, I see a latency of between 2-3ms, so that would suggest that the processing required to process PRCs is a little heavier. If you think about the logic of how RPCs work this makes sense, since at the very least they are going to need to do some form of reflection. Reflection isn’t particularly cpu friendly…
I could be totally wrong here, but it would be good to know if you get the same results with NetworkClient.SendWriter etc?
@donnysobonny
Yeah, no such luck. It seemed promising enough since there really are some reflections and dictionary lookups in there that can be avoided. Unfortunately it doesn’t seem to actually gain me anything.
I just updated the attached project to include a NetworkWriter test. I use a custom network connection class so I can catch the bytes in TransportReceive(). I send the time from the host to client, then send it back from client to host as soon as it is received. Still takes 17-34ms.
Can you maybe take a look at my code and see if I’m doing anything wrong (it’s really not much code). Or maybe just build it and run it locally and see if you get the same results? Maybe there’s just some problem with my local network.
It is in fact so little code that I might as well post it here:
public class NetworkManager : UnityEngine.Networking.NetworkManager
{
static Stopwatch sw = new Stopwatch();
static float elapsedTime;
static byte error;
public override void OnStartServer()
{
NetworkServer.SetNetworkConnectionClass<CustomConnection>();
}
public override void OnStartClient(NetworkClient client)
{
client.SetNetworkConnectionClass<CustomConnection>();
}
public override void OnServerConnect(NetworkConnection conn)
{
if (conn.hostId != -1) StartCoroutine(continuouslyPingClient(conn));
}
IEnumerator continuouslyPingClient(NetworkConnection conn)
{
while (gameObject.activeSelf)
{
sw.Reset(); sw.Start();
NetworkTransport.Send(conn.hostId, conn.connectionId, 0, new byte[] { 1 }, 1, out error);
yield return new WaitForSeconds(1.0f);
}
}
private void OnGUI()
{
GUI.Label(new Rect(500, 10, 100, 100), elapsedTime.ToString());
}
public static void ReceiveBytes(byte[] bytes, int receivedSize)
{
if (!NetworkServer.active)
{
NetworkTransport.Send(singleton.client.connection.hostId, singleton.client.connection.connectionId, 0, bytes, 1, out error);
}
else
{
sw.Stop();
elapsedTime = sw.ElapsedMilliseconds;
}
}
}
class CustomConnection : NetworkConnection
{
public override void TransportRecieve(byte[] bytes, int numBytes, int channelId)
{
NetworkManager.ReceiveBytes(bytes, numBytes);
}
}
I’m even using NetworkTransport.Send to avoid the overhead of wrapping the bytes in a NetworkReader / Writer. No effect
I just tried testing with regular old sockets and I get right around 0ms pretty consistently. So it’s not my network (not that it ever should have been for connecting two builds on the same PC), and it’s not just standard overhead. It’s something happening somewhere between the socket layer and LLAPI as far as I can tell.
Hmm, I haven’t actually tested your code, but i’ve had a look into my solution, which is different from yours and does perform better from a lower-level, but I see the same issue. I have a funny feeling though that this might be an intentional change, since in real-time games you want latency, at least some, because this allows the internals of the framework to combine-up small messages into larger ones making for more efficient real-time networking (fewer large messages are better than more small messages, due to the packet overhead). Again though, I could be totally wrong and it could be an unintentional bug. Either way, we really need some input from @aabramychev on this one to clarify.
To give you a few tips on the route that you’re taking, particularly since you are looking for the best performance:
avoid using the NetworkManager. It has a lot of bloat that you can absolutely avoid to increase performance
avoid using RPCs, Commands and Syncvars. They are convenience tools, and can be avoided
instead, use messages. Check out the NetworkClient.RegisterHandler and NetworkServerSimple.RegisterHandler methods, these allow you to register a delegate to listen for certain message types. You can use your SendWriter methods on the sending-end to send very efficiently. Don’t worry too much about the construction of the reader/writer overhead, it’s currently only 32 bytes of garbage which I suspect will be fixed in a nearby patch
if you don’t need websockets, avoid everything HLAPI, and go all-out LLAPI. The HLAPI is built mainly for convenience, so even if you use it a low-ish level as described above, you’ll still inherit performance reductions
Hopefully this helps you, let’s wait for Alex’s response on your findings.
@donnysobonny Thanks for the tips but I’m already doing all that. I’m using the NetworkManager in that example just for OnStartServer/Client but the actual sending is done via NetworkTransport.Send which is the lowest level you can get in UNet.
Yeah, you’re not the first person to tell me this, and I’m sure you’re right, I would just like a way to configure how long it waits or maybe have a high priority queue that doesn’t wait at all. I thought that’s what some of the settings would let me do (such as minUpdateTimeout) but I’ve tried everything.
As an additional note, in 5.6 the RTT is worse unless I set the new sendDelay parameter to 0. The sendDelay seems to be exactly what you’re describing, but apparently it is separate from the delay I’m seeing.
it is easy to explain. Imagine that rtt between is 0ms (ping 0ms). But your frame and my frame is 1 day long. I able send me message only one time per day and I can read message only one per day. My question is What latency will be here?
You sent me message at 9am, in 9am + 0ms it was delivered, but unfortunately I’ve already gone… and will be able to read message only at evening.
Main thread (UNITY main thread) where you call sends and receives has frame rate, it can be 30, 60, 120 frames per seconds and when you measure time you need take this fact into account
How to I get / set the main thread frame rate? Is that Time.fixedDeltaTime or Time.deltaTime or something else?
I will say though, the time that I’m interested in isn’t really RTT, it’s the time it takes between sending the message and actually being able to do anything with it, which would include this frame time I guess.
I do understand much better now why my pure socket example was so much faster though.
I mean following
take a look on the picture, in spite that message was delivered almost immediately (black arrow), you will be able to read them only on the next frame (red arrow)
Yeah, I think I understand the general idea at least, just trying to nail down the specifics. So the frame duration is Time.deltaTime?
Also, is there any possibility at all of getting the message the moment it is received, rather than waiting for the main thread to poll for it? I understand unity is not thread safe so this is dangerous, but as long as I only do thread safe things it should be fine right?
if you need in lag compensation tools you can use timing service provided with UNET.
Call NetworkTransport.GetNetworkTimestamp()
Add receiving timestamp to your sent message
When you will receive message extract this timestamp
Call Networktransport.GetRemoteDelayTimeMS() function with receiving timestamp as parameter.
In result you will receive delaying in milliseconds between steps 1 and 4
Q: Also, is there any possibility at all of getting the message the moment it is received, rather than waiting for the main thread to poll for it? I understand unity is not thread safe so this is dangerous, but as long as I only do thread safe things it should be fine right?
Yes and no, in the server.dll product we supported receiving callback, so when message come in user will immediately notify about this event. For client library we have not decided yet.
(main reason) call back will be called from unet worker thread, it means that any additional work will cause block of network threads and block the library work. So, for proper using the callback u will be able to create thread safe queue with events, to release this function asap. As this technique requires pretty high qualification we just afraid to expose it…
If you do all of your work based on the frame, doesn’t really matter when you receive message, if you cannot handle it. Back to picture (recv on top should be moved left:(), add square Handle Input just near frame start, in this case you will receive message in the same frame but can handle in the next…
(in picture recv boxes should start immediately after frame starts… my fault)
I have done exactly that and the remote delay each way seems to add up to the round trip time so that all makes perfect sense.
Thank you for helping me understand everything. Almost everyone I asked gave me misinformation about sendDelay and grouping messages and stuff that had nothing to do with the actual problem so it’s great to hear from someone on the inside who actually knows.
I see exactly what you’re saying about the threading too. It really wouldn’t matter if I could receive the message immediately, nothing I could do with it would be able to effect anything until the next main thread update anyway.
Except for one thing! On the host there are a lot of messages that I receive from the owner and immediately echo back out to all clients. I think it might save a not-insignificant amount of time if I could receive those messages and send them back out without ever having to hit the main thread at all.
sendDelay usually make sense, see ip header is 20 bytes and udp header is 8 bytes + header of UNET packet + header per message. if your message is 40 bytes longs your relative payload will be ~ 50%, to avoid this and increase “Energy conversion efficiency” we use two different approaches (in 5.6)
SendDelay parameter, when your first message will delay for this timeout expecting that you will send more
QueueMessageForSending/SendQueuedMessages where first function doesn’t send anything but prepare batch of messages while the second sends the whole batch. You can use the second one, call QueueMessageForSending from any place if your frame, while second function just before frame end to be sure that all messages which you send during frame will fire out in the same time.
Q: Except for one thing! On the host there are a lot of messages that I receive from the owner and immediately echo back out to all clients. I think it might save a not-insignificant amount of time if I could receive those messages and send them back out without ever having to hit the main thread at all.
Yes it is. But, hmm, ok it is my personal opinion not related to company
But (theorem) immediately after this another 100 users will try to update the world in this callback, and will generate 1000 bugs about why my network layer doesn’t work To avoid this bug storm we need to implement something which allow to use this feature safely. Unfortunately, we still do not have clean solution for that
Almost the same situation with threading, theoretically the library is thread safe per connection (for sending) and per host for receiving, but… the same reason.
I guess we will discuss internally about callbacks again, and will probably publish them with something like “use with care”
That is great to hear. Alternatively though, at least for my specific case, it may be safer / easier to implement some way for clients to say “I want this command / NetworkMessage to be echoed from the host to other clients without touching the main thread at all” if that makes any sense. The callback would be a more generic solution that could probably be used for other neat things that I haven’t even thought of, but I think you’re right about users abusing it. Even with a warning in the doc it’s just inevitable.