We updated a couple of our games from Unity 2018 LTS to Unity 2019.4.28f1 in the last couple of months.
Immediately after rolling out the update built with Unity 2019, we started to receive backend notifications regarding increased latency. I have attached a graph that depicts the latency increase for one of our backend services.
In our debugging, we found that for these “hanging requests”, the POST http request body didn’t seem to be getting sent up, so our backend service continues to wait until the request is eventually killed by the backend for taking too long. We see this for both Android and iOS requests but Android seems to be worse.
On the client-side, we time requests out after 10 seconds. Our backend service will time a request out after 20 seconds. In the ideal case, we would expect that the client would time a request out after 10 seconds and the connection would be closed. Interestingly, the connection doesn’t appear to be closed as expected and we end up receiving the backend service timeout after 20 seconds, even though our logging tells us that the client-side request did in fact time out after 10 seconds.
I understand that there is a lot to dig into here. I am more than happy to provide any possible extra information and help debug if anyone has an idea about where to start. I am curious if anyone else has experienced these kind of symptoms after upgrading. Thanks in advance!
Thanks for the reply! Since the majority of requests are completing as expected, I think it’s unlikely to be a yield problem. I believe we are respecting the IDisposable interface as well. Here is the main underlying method that is being used to dispatch all requests in our games:
Try setting useHttpContinue to false on UnityWebRequest.
As for connections, we now support the keep-alive, so connections are not closed intentionally for future reuse.
Thanks for your reply! We will definitely be giving useHttpContinue = false a try.
In our experimenting with it, we noticed that even though that property defaults to true, we weren’t seeing the Expect header getting set in all circumstances.
Would you happen to know what the criteria is for that header to be added on both iOS and Android? We were thinking it might have something to do with the size of the payload.
For these problematic requests that we are seeing, the request body is often only around 229 bytes in size.
On Android, when examining the requests with Charles, it looks like the Expect: 100-continue header is set perhaps only for requests that are over 1024 bytes in length. We don’t see it being set for smaller requests which we are also experiencing issues with so I guess it’s unlikely to be the exact problem going on here.
Is there anything else that I can try or provide to help debug?
On the client-side, since we’ve defined a timeout for the requests of 10 seconds, isNetworkError is true for the UnityWebRequest and the error message is Request timeout which makes sense.
The problem is the request hangs on the backend until it gets killed after 20 seconds (even though it has timed out on the client-side).
Would you happen to know what the expected behaviour should be in this case?
I’d start with increasing a timeout. Considering not very good connection and longer DNS lookup, 10s is a very low timeout. HTTP specs recommend 2 minutes IIRC.
In UWR the timeout is not inactivity time, it’s overall time to perform an entire request.
Increasing the timeout may be a good idea even though the user experience may be hampered if they have to wait too long. We do have a retry mechanism in place as shown in the code above.
If we ignore the fact though that the timeout is low, is it not strange that the request continues to hang on the backend after it has timed out on the frontend? Is this just a symptom of the fact that the user’s connection is likely unstable so the backend was not able to be informed that the socket was closed?
Do you know what the difference was with Unity 2018 that meant that this behaviour didn’t occur?
There were under the hood changes in implementation for both Android and iOS, but I don’t know of a reason why such failures would start, only possibly more visible in cases of poor connection.
If you want to keep low timeout, you should be monitoring things like upload and download progress/bytes sent&received and apply your timeout for “no progress made” only. On a poor connection you certainly don’t want to abort and retry entire request if it is progressing, no matter how slowly.
That definitely makes sense, I appreciate the suggestion!
In our next release we will be adding some monitoring for uploadedBytes and downloadedBytes to hopefully better understand these timeout cases we are seeing. Since we are seeing it hang without a request body on the backend, I suspect uploadedBytes would be zero but I’m hoping to confirm that soon.
@Aurimas-Cernius I noticed a line under the Improvements section in the Unity 2020.3.16f1 release that I was wondering if you knew any more about:
“Networking: UnityWebRequest on iOS no longer uses operation queue for uploads, upload data will request by a callback from system.”
Would you happen to know the motivation for this change?
I built a sample project to take a peak at the code in UnityWebRequest.mm and it looks to me that the operation queue is still being used for uploads based on:
But maybe a different operation queue is being talked about for this change log note?
Thanks in advance for any extra information you are able to provide! We are just rolling out a new build with monitoring for uploadedBytes and downloadedBytes which we are hoping will continue to help shed some light on the situation we are facing.
While the problem still exists, just wanted to note that our team updated one of our games from 2019.4.30f1 to 2020.3.18f1 and we saw a significant improvement in this issue. I suspect that has come from this changelog item:
iOS: Reduced memory usage for small uploads in UnityWebRequest. (1355235)