I am attempting to get a VERY minimal amount of communication going over the UNet messaging system. On two instances over localhost, it works like a charm (even, iirc, with network simulation latency & packet drop set up). On the production test LAN, things fall apart, generally in under half an hour.
After running for X amount of time, including all clients being able to connect & reconnect successfully, the first symptom is the server noticing a timeout, and then proceeding to apparently trip over itself somewhere deep in the UNet internal cleanup code:
Log: connection {2} has been disconnected by timeout; address {::ffff:192.168.35.26:61645} ArgumentOutOfRangeException: Argument is out of range.Parameter name: index at System.Collections.Generic.List
1[UnityEngine.Networking.NetworkConnection].get_Item (Int32 index) [0x00000] in :0 at UnityEngine.Networking.ConnectionArray.GetUnsafe (Int32 connId) [0x00000] in :0 at UnityEngine.Networking.NetworkServer.InternalUpdate () [0x00000] in :0 at UnityEngine.Networking.NetworkServer.Update () [0x00000] in :0 at UnityEngine.Networking.NetworkIdentity.UNetStaticUpdate () [0x00000] in :0
`
After that point, the connection index of subsequent connections on the server is permanently increased (by one, each time the above error shows up), and moreover if the same client attempts to reconnect, it succeeds in the initial connection but then throws a slew of
Warning: received system packet belongs to wrong session
logs before timing out
Log: connection {1} has been disconnected by timeout; address {::ffff:192.168.35.45:4444} UNet Client Disconnect Error: Timeout
The client is never able to successfully maintain a connection after this point. Restarting the client app doesn’t help. Disabling and re-enabling the client machine’s network interface temporarily solves the problem, but it inevitably comes back, generally in under an hour. It seems to affect different machines at different times (e.g. one laptop can be unable to connect, persistently, while another laptop of a different make/model can be just fine, then two hours later the situation is reversed), and to affect both running with NetworkManager (direct IP, no matchmaking) or with basic NetworkServer.Listen(port) / networkClientInstance.Connect(ip,port) calls.
Has ANYONE else seen this behavior? It seems awfully severe for me to not be seeing any hits on Google. I’ve already tried various ways of manually flushing and disposing the connection on both ends after disconnect in an attempt to get the old connection/session out of the system, to no avail.
I can’t post too much code due to NDA, but the gist of the implementation is a persistent static class called NetworkMessenger which serves as a readily-accessible root hub for all message subscription and relaying within the game. (Please don’t lecture me on the proper use of the proper singleton paradigm in Unity. Yes, I’ve tried it. Yes, it doesn’t make as much sense in the context of this project. No, I do not believe that usage of sockets ought to have anything to do with whether or not they’re on an instance or global/static.)
Significantly involved connection/setup logic is as follows:
`
static void SetupClient()
{
if (_isSetup)
{
Debug.Log("NetworkMessenger: SetupClient called when already set up");
return;
}
if (initBehavior == eInitBehavior.BE_CLIENT) m_client = new NetworkClient();
else if (initBehavior == eInitBehavior.USE_HLAPI) m_client = NetworkManager.singleton.client;
else
{
Debug.LogError("NetworkMessenger: SetupClient called on server configuration!");
return;
}
RegisterClientMessages();
m_client.RegisterHandler(MsgType.Connect, OnConnectMessage);
m_client.RegisterHandler(MsgType.Ready, OnReadyMessage);
m_client.RegisterHandler(MsgType.Disconnect, OnDisconnectMessage);
m_client.RegisterHandler(MsgType.Error, OnErrorMessage);
if (initBehavior == eInitBehavior.BE_CLIENT) m_client.Connect(serverIP, port);
_isSetup = true;
}
static void SetupServer()
{
if (_isSetup)
{
Debug.Log("NetworkMessenger: SetupServer called when already set up");
return;
}
if (initBehavior == eInitBehavior.BE_CLIENT)
{
Debug.LogError("NetworkMessenger: SetupServer called on client configuration!");
return;
}
RegisterServerMessages();
NetworkServer.RegisterHandler(MsgType.Connect, OnServerConnectMessage);
NetworkServer.RegisterHandler(MsgType.Ready, OnServerReadyMessage);
NetworkServer.RegisterHandler(MsgType.Disconnect, OnServerDisconnectMessage);
NetworkServer.RegisterHandler(MsgType.Error, OnServerErrorMessage);
if (initBehavior == eInitBehavior.BE_SERVER) NetworkServer.Listen(port);
Debug.Log("NetworkMessenger listening on "+port+" ("+NetworkServer.active+")");
m_client = ClientScene.ConnectLocalServer();
RegisterClientMessages();
m_client.RegisterHandler(MsgType.Connect, OnConnectMessage);
m_client.RegisterHandler(MsgType.Ready, OnReadyMessage);
m_client.RegisterHandler(MsgType.Disconnect, OnDisconnectMessage);
m_client.RegisterHandler(MsgType.Error, OnErrorMessage);
_isSetup = true;
}
public static void Init()
{
if (_isSetup) return;
if (initBehavior == eInitBehavior.BE_CLIENT)
{
SetupClient();
}
else if (initBehavior == eInitBehavior.BE_SERVER)
{
SetupServer();
}
else
{
Debug.LogError("Cannot Init() NetworkMessenger with initBehavior USE_HLAPI. Use InitHLAPI() instead.");
}
}
public static void InitHLAPI(NetworkBehaviour nb)
{
if (_isSetup) return;
initBehavior = eInitBehavior.USE_HLAPI;
if (!nb.isServer)
{
SetupClient();
}
else
{
SetupServer();
}
}
...
static void OnDisconnectMessage(NetworkMessage netMsg)
{
DisconnectMessageDummy msg = new DisconnectMessageDummy();
msg.server = false;
InformSubscribers(DisconnectMessageSubscribers, msg);
if (initBehavior == eInitBehavior.BE_CLIENT)
{
//m_client.Connect(serverIP, port); // auto-reconnect try 1 - server still retains connection(?) and gets out of sync
// getting many "Warning: received system packet belongs to wrong session" logs
// which I can only surmise means something is not getting fully torn down before the socket restores
// and the client and/or server are putting data on the old connection rather than a new one.
// So tear down just as much as we can before retrying.
_isSetup = false;
m_client.connection.FlushChannels();
m_client.Shutdown();
//m_client.Disconnect();
//m_client.connection.Dispose();
m_client = null;
//SetupClient(); // auto-reconnect try 2 - server can still seemingly accept connection 3 before fully tearing down 2 and thus get out of sync
//DelayedReconnect(1.0f); // auto-reconnect try 3 - no static coroutines allowed :P
if (autoreconnect) SetupClient(); // try 4 - only auto-reconnect immediately if set to do so; leave delayed reconnect to outside scripts
}
}
static void OnServerDisconnectMessage(NetworkMessage netMsg)
{
DisconnectMessageDummy msg = new DisconnectMessageDummy();
msg.server = true;
InformSubscribers(DisconnectMessageSubscribers, msg);
// still getting wrong-session messages on the clients. So. Force server teardown?
netMsg.conn.FlushChannels();
netMsg.conn.Dispose();
}
`
Updates as of 2015-12-23:
5.2.3 appeared to fix this somewhat. At least some connections are able to succeed after the server trips its ArgumentOutOfRange errors.
However, 5.2.3 resurfaced a hosts-exceeded issue which must still be resolved by manually removing host entries on disconnect a la http://forum.unity3d.com/threads/maximum-hosts-cannot-exceed-16.359579/
Further testing reveals the issue still exists on 5.2.3 (not jumping to 5.3 yet given the large number of other reported issues). Still having the ArgumentOutOfRange exceptions fairly regularly, and while more reconnects to succeed, it is possible for clients running for an extended duration across multiple reconnects to slip into a state where they connect but get a dozen or so wrong-session errors before timing out.
I have tried various methods of cleaning up the server connections on disconnect including
netMsg.conn.FlushChannels();
netMsg.conn.Dispose();
or
// clean up ALL connections originating from the same host
List<int> connIDs = new List<int>();
foreach (NetworkConnection c in NetworkServer.connections)
{
if (c!=null && c.hostId == hostId)
{
connIDs.Add(c.connectionId);
}
}
foreach (int id in connIDs)
{
NetworkServer.RemoveExternalConnection(id);
}
// AND depop the null connections which seem to build up
while (NetworkServer.connections.Contains(null))
{
NetworkServer.connections.Remove(null);
}
which don’t functionally break anything further but can result, somewhat expectedly, in complaints that a few trailing messages are being sent to already-destroyed connections. They do not, however, eliminate any of the original symptoms; the latter simply eliminates a new symptom of the “same” connection getting multiple disconnect triggers per disconnect, e.g.
Log: connection {2} has been disconnected by timeout; address {::ffff:192.168.35.215:51130}
Log: connection {1} has been disconnected by timeout; address {::ffff:192.168.35.215:51130}
Log: connection {2} has been disconnected by timeout; address {::ffff:192.168.35.215:51130}
Log: connection {1} has been disconnected by timeout; address {::ffff:192.168.35.215:51130}
Log: connection {2} has been disconnected by timeout; address {::ffff:192.168.35.215:51130}
Log: connection {1} has been disconnected by timeout; address {::ffff:192.168.35.215:51130}
Log: connection {2} has been disconnected by timeout; address {::ffff:192.168.35.215:51130}
Log: connection {1} has been disconnected by timeout; address {::ffff:192.168.35.215:51130}
Log: connection {2} has been disconnected by timeout; address {::ffff:192.168.35.215:51131}
which itself seems to speak to even more connection cleanup issues in UNet’s under-the-hood management.
Killing/bouncing the entire host for a particular connection (NetworkTransport.RemoveHost(netMsg.conn.hostId) also does more harm then good, and has thus far only served to prevent any reconnects whatsoever.