Is Cloud Code capable of handling this scenario? (player hosted dedicated servers)

I asked this question in the Lobby forum but further research leads me to believe Lobby is not the right solution, but Cloud Code might be.

I am working on a multiplayer client/dedicated server game using ECS/Netcode for entities, etc. But my players are intended to host the provided dedicated server builds themselves.

The multiplayer menu screen from a client is intended to show a list of all other players' dedicated servers that are up and available to join. My dedicated servers read from a json file of server options from StreamingAssets, which contains user-defined information about the server that is intended to be published, such as server name, a short and long description, the IP/port connection info the clients will need to connect to it, the servers PVP type, whether or not you need a password to join, etc. (very reminiscent of NWN style multiplayer server selection) These servers are meant to start up with no players connected, but allow players to connect/come and go as they please.

The dedicated server will need a configurable address to talk to a 'listing service' (Cloud Code scripts or something else?) whose only job is to collect the server info from servers that choose to list to it, and then provide that info to the clients when they connect to the listing service to ask what public servers are available. The dedicated servers will ping the listing service occasionally to let it know it is still up, and to update the online player count (of that server) and/or any server info that has changed. The listing service won't be hosting anything beyond this basic centralized recordkeeping. My alternative is probably just a simple backend webservice to receive server records when they get sent to it and send a list of (non-stale) server records to clients when they ask for them. (with a configurable endpoint URL in another streaming assets file. I'd rather not hardcode myself into only one listing service choice)

The listing service won't be doing any of the client-server connections. I already have that handled through Netcode stuff already, it only needs to supply the connection data from the server the client chooses to connect to and the client will then make that connection on its own.

My main concern about Cloud Code is that it does not necessarily host persistent data? Is that correct? I would need persistent data of some kind, either live in memory (Cloud code does not just shut down after execution, because it needs to wait for dedicated server pings to maintain the centralized active list) or there needs to be a shared cloud save file only accessible by cloud code.

If Cloud code cannot support remaining active in memory to await more calls to update its records, then a shared, server-side file update system would have to be used (only writable from cloud code). For example: a dedicated server comes up and runs cloud code to write an entry on its behalf with a timestamp to a server-hosted list/save file. Another dedicated server comes up and this repeats, appending to the same save file. A dedicated server pings the cloud code and it updates its timestamp in the file and any change in the amount of players it says are currently connected to it or data that describes this server entry. A client runs a different set of cloud code which reads the list/save file and hands back all server records that have a 'recent' timestamp (and are thus assumed to still be online) so that the client knows what servers are available to try and connect to. Any cloud code call may take the opportunity to purge the list of server records that have timestamps too far back in the past, assuming those servers are no longer running.

Is this possible to do with cloud code?

Hello! What you describe should be achievable using Cloud Save Game Data, it allows you to store private data that is only accessible through Cloud Code scripts. Each dedicated server can create and maintain its own dedicated Game Data record, and you can use the Queries feature to list records matching your criteria (age, number of players, etc). Keeping track of which servers are no longer live and should be cleaned up from the records is tricky though, as age might not be a good heuristic for it. Ideally your servers would clean up after themselves when they shut down, though I understand you might not be able to rely on this always happening.

Having said that, I'm curious to learn why Lobby and Relay didn't fit the bill here. At least in principle these two products should solve your problems in a better way than what can be achieved by wiring it up manually using Cloud Code and Cloud Save. What was it about them that made you think they are not the right fit for you?

Yes, but not a very efficient use of the service.

You must ensure that the dedicated server only makes calls to Cloud Code as a player, not as an "authority" with your service account.

A new dedicated server would call a Cloud Code method informing it about the server going online plus any parameters. The Cloud Code method can then store this info in Cloud Save. There ought to be a keep-alive call every minute or so in order to flag the server as still online.

You'd have to have scheduled Cloud Code method calls which clean up any registered servers who didn't send a keep-alive for more than a minute.

Another user-level Cloud Code method call could then provide a (filtered) list of available servers.

Basically just as you outlined, if I understood correctly. But here's the catch: run the numbers! Try to estimate the costs for these calls based on some estimations or assumptions such as number of servers online in a day and how long CC calls will take.

You're most likely going to spend far less if you create a simple webservice with a SQL backend database to do the same thing: register a server, update server, cleanup servers frequently, and request list of online servers. This ought to be relatively straightforward to implement with, say, PHP and MySQL (definitely not "files").

Cloud Code is effectively C-style methods that execute and return, they have no static variables or access to class fields of any kind. So no persistance within Cloud Code. If you need to persist data it has to be stored elsewhere, for instance Cloud Save.

A Cloud Code method has a timeout of 15 seconds. You can certainly await calls, which does not factor into this timeout (await = CPU idle) but all remaining processing time cannot be longer than the timeout or the process will be killed and you get a failure response.

If you were to purposefully run a CC method for 15 seconds (excluding awaits) you'd spend about a 1,000 times more in "compute time" than if it were to complete in 15 ms. The cloud method's run-time can make a huge difference in cost.

If you're doing all this in order to save money not hosting your own dedicated servers it might still be cost effective since server hosting is the biggest chunk in terms of live service costs. But IMO Cloud Code can come in at a pretty high cost too if you're not carefully planning its use ahead of time and implement sensible limits, specifically to prevent costly issues. Imagine a user scripting servers launching but it happens multiple times per minute due to a bug or malicious intent and that script being shared among your community of players.

Hi @CodeSmile

Just to clarify this point; awaiting a request response for 15 seconds would not cost a full cpu second per second as the cpu does very little during this time. If the request was processing a response e.g. looping and manipulating data for 15 seconds then that would be a costly invocation as the cpu is being fully utilised.

1 Like

I'm aware, I meant to clarify it in the post, thanks for the reminder. ;)

I'm not opposed to using Lobby or Relay they just did not feel like a good fit, I could not find any samples that really addressed my scenario, and there seemed to be alot of stuff focused on I didn't need.

"Relay facilitates a multiplayer game session without a dedicated game server (DGS) or the complications of direct peer-to-peer connections."
I explicitly would be using dedicated servers, multiples, though the dedicated servers themselves won't talk to each other, and are entirely independent. I'm already handling and planning to handle direct client-to-server network traffic via netcode for entities between server builds and client builds. The players who choose to run their own servers (that I published the exes for) will be dealing with the run costs and bandwidth requirements related to that. I will not use any solution that puts a burden on players that requires them to do extra steps such as setting up their own UGS related service accounts with unity. I'm not assuming that would be needed, just stating it as a requirement. (Generic player authentication accounts either via unity or steam, etc probably will be a thing though)

"Lobby facilitates grouping players and configuration settings before game sessions."
I won't need to group players or configure settings before game sessions. Player that host their own dedicated server will do that themselves via basic config file tools I provide. This part is already working. A server is required to be capable of launching with zero players, and just listen while dormant for incoming connections, and let player come and go as they please. It will not shut down when the players leave. There is no concept of a 'game has ended.' except for when the host deliberately decides to shut the server down.

As previously mentioned, these will be highly similar to NWN1/2 style short-lived/restartable campaigns or long-lived persistent worlds. As such I also need to support multiple levels of player permissions, 'player', 'gamemaster', and 'admin'. I do this via a password sent when the player chooses to connect to a target server. I already have this working. Gamemaster/Admin is required to receive a matching password/authentication upon connection. A player password is optional, only required if the custom configuration settings of the host says one is needed. I do not know if Lobby/Relay even supports this concept. (I don't mean lobby codes, but the identifiable variable permission level stuff)

A player/designer/host will launch a server that does whatever they've customized it to do given the modding tools I would expose. They may want to leave it up only for a short time for a prearranged (repeatable) play session with select invited friends, or open it up to everyone and just leave it up and running 24/7 for come-and-go play. I, as the developer,, will not be hosting any of these. The one thing I do need to provide is some kind of aforementioned listing service so that players can discover what servers are available to join. The listing service is actually entirely optional, it is a convenience for discoverability. I already have direct client-to-server connection working, you just need to know where to connect to in advance in that case. Again the listening service does not need to handle making the connections, just handing the connection info off to the client and letting the client do that on its own.

All I am really doing with this part right now is planning ahead and thinking about what solutions I will need for this piece in the future and which unity services are viable or not. I have plenty of other work that will probably take a long time to complete related to everything else mentioned. I've long suspected a simple webservice and database would be the way to go for the listing service, I just wanted to see what other options might be possible and if any fit within the free tier unity offers.

I agree, but I have no data at this time, nothing is even finished let alone out yet. However, the listening service is not a critical/ high update rate required service. I don't know how many future dedicated servers will exist and be running simultaneously but they'd only need a call on start-up, an 'im still alive' call, possibly spaced out as much as a minute or longer, and a 'im shutting down' call (when this happens gracefully). A client would only ever ask for a read-only server list when they open that screen. Again I doubt that screen would need to be refreshed at a high rate, it could be spaced by 30 seconds or more as it's not critical. Once a client chooses to connect to a server (or otherwise leaves the screen) it does not need to continue requesting server lists anymore. A client also has no real reason to linger on that screen. I'm not (at this time) planning to add any friends list or lobby chat or things of that nature. It's not needed for a minimum viable product.

Hey @samg-unity !

I know this is slightly off topic of the main post but this is exactly what I was looking for in terms of long run times in cloud code modules. You mention async awaiting a request, such as awaiting a response from an external API call or awaiting a response from Cloud Save, does not cost a full cpu second per second since the cpu is rather idle during this time. Does this mean that cloud code modules can run for longer than the 15 second timeout if it includes await calls similar to what @CodeSmile said?

Could you say, send and await an external API request that takes like 20-30 seconds to get a response, then process it once the response comes in if the processing takes less than a few seconds? Meaning you only are technically running the full cpu for a few seconds. Or would the module call timeout due to waiting too long? I have yet to get a clear answer on something like this, so any insight would be great!

It would only take a simple test. ;)

I think the docs are pretty clear though. If a CC method runs for longer than 15s it gets killed, and I would be surprised if async/await would prolong this "kill timer". Otherwise the method would block the resource (thread, memory) for longer than necessary.

An API request or even multiple successive requests taking more than a few seconds total would be rather unusual.

Hi @Camicus27

With this particular use case. the answer would be no as the lifetime of the module invocation would have exceeded the given 15 seconds, the external api request connection would be cancelled/ closed so there would be no way for the response to be handled.

It is not the case that an invocation is given a total of 15 CPU seconds to utilise. An invocation is currently given a total 15 seconds to complete and you will be charged for the amount of CPU usage that occurred during that period of time.

Hopefully that clarifies the distinction?!

But just to gain some more insight into your particular use case, would these long running API requests be few and far between or would you want them to be executed fairly regularly by players?