Isn’t this discussion in the context of game streaming? Under perfectly ideal conditions you can get down to ~50 ms input to display latency with streaming, but more realistically you’re going to be around 100 to 200 ms even with Stadia because you’re not playing on a server that’s within 5 miles over a wired Google Fiber connection.
That might seem crazy high, but on consoles with a TV left in its default settings, that’s the kind of latency you’re getting anyway (or worse). On PC if you have a super high end PC, running a game at >144hz with a super fast gaming monitor* from the last year or two, and gaming peripherals you can get down to 30 ms.
- As an aside, the “1ms response time” thing you see slapped on monitors is 100% marketing bull. Every LCD made in the last 20 years is a “1ms response time” monitor by the metrics they’re using, because there’s no industry standard for how to measure response time, any LCD panel overdrive hard enough can go from black to white in 1ms or less, but that’s not a realistic use case. Also response time is completely separate from input lag!
All streaming services are trying to reduce the time it takes for inputs from the user to reach their servers, how long it takes to update & render the game, compress the rendered image, and send it back. There’s a ton of latency with normal PCs & desktop OSs that using a mobile or dedicated thin client can skip, so that gets rid of a few ms. There’s the issue of how far any data has to travel over the internet, and the “speed of light”. Also the number of nodes the data goes through before getting to it’s end goal can have a dramatic effect both on the overall distance and the latency. So they’re going to have custom networks that try to get the packets off of the “public” network sooner so they have as straight a shot to the data centers as possible. Then when running & rendering the game, they’re likely going to be running the game not so it’s going to take the usual 16ms to update and 16ms to render, which is what PC and consoles do, but on super powerful CPUs and GPUs that can do both combined in less than 5ms. Then they have custom video compression hardware that can spit out a compressed image in a few ms. Then back over that custom network to get the data as close to you as possible before going back over the public network & your ISP’s network.
With all that they’re still struggling to get much better than 100ms in the real world.
The idea with speculation is you make a guess at what the player is going to do before they do it, and start running the game as if they did that action, and rewinding the game if their inputs don’t match. All this to try to get down to less than 100ms of perceived latency.
There are a few ways to handle how that’s exposed to the player. The original paper I linked to above sends a cube map + 4 alternative futures, the later in the form of lower quality “flat” images that can be projected against the cube map. If the player’s input matches, or is at least close to one of the “speculative” alternative futures, it shows that image instead of the main one. Then on the server it rewinds the game, applying the user’s real input, and fast forwards to “now” with the real inputs. This is enormously expensive in terms of computational power on the sever and bandwidth, but has the benefit of as soon as you shoot your gun or swing your sword you see visual feedback similar to what you expected to see without having to wait for the round trip latency.
Later versions of the paper I linked to instead seem to not send all 5 video streams, but instead rewind the game and fast forward to apply the inputs the player intended, using speculation to limit how often that has to happen. Instead you just see a single video feed, and when you shoot your gun you simply miss the first few frames of animation and the video you see locally just skips ahead as if you’d pressed the button earlier. This means things react, but it means if any game has effects or animation that happen in the first few frames you’ll never see them. Oddly this might actually “feel” better than seeing all of the frames. Skipping frames to make something feel punchier is a common technique for movie action scenes. This is likely what Google Stadia would be doing. The human brain is weird too, and if you’re not actively looking for this it’ll back fill your memory to make you think you saw everything happen anyway.
Not exactly correct. Specifically your use of the term “authority”. Most games competitive games are running simultaneous simulations on both the local player’s hardware and the server and validating what the local client is saying it’s doing. The person with the lowest ping does indeed usually have an advantage, but some games will actually apply high ping players’ actions to the simulation “back in time” to account for their higher ping. Games that allow for two players to kill each other at the same time for example may be taking into account the individual players’ pings for their attacks, effectively letting them get a couple of shots off “after” they died by pretending they happened earlier. But almost no multiplayer game today gives full authority to the client as it’s just too easy to exploit.
Hower the “peeker’s advantage” you mentioned is actually a direct artifact of speculation that already occurs on modern multiplayer games! If you’re standing still behind a corner then step out quickly, you will indeed step out and be able to see & shoot before your opponent is able to see you do it. If you just run around the corner, you do not get this advantage as the server and your opponent’s client is speculating that you’re going to run out anyway and so they see you at the same time (or sometimes sightly before) you actually step out. It can even go the other way where you might think you stopped just before going around a corner, but your opponent sees you step out and then duck back. Again, speculative prediction of the data your opponent’s client has knows you’re running that direction, so it expects you to keep running, and that’s what it shows to your opponent.
However in the case of your opponent shooting you at the moment you “stepped out”, most modern games will validate this on the server’s version of reality and see that no, you stopped before walking past the corner, so the other player’s shots will not do damage. That doesn’t stop people from feeling like they got killed when they shouldn’t, but that comes down to a combination of the peeker’s advantage, and usually some of your body is visible around a corner before you can see around a corner. 
The big benefit that streaming services have is the reduced latency between the clients and the server, and the possibility of not even having a separation between the two. You could be literally playing the game “on the server”, as in there is a single executable doing all of the simulation, and the “clients” are just rendering out the game state to stream to you. This removes a lot of the complexity of modern multiplayer games, and could legitimately make for multiplayer games that feel just as responsive as existing console single player games. Again, ~150ms of total latency for a single player console game is pretty normal, so if that’s what you get from a streaming service playing games, then you could have a multiplayer game that feels exactly as responsive, because it is. There’s also a lot of things modern multiplayer games just don’t bother attempting because of the inherent latency between the clients and the server that could now be possible, like significantly more dynamic and physics based elements, as well as much larger player numbers.