Hello,
I like to start my questions with a bit of context but if you don’t care about it that much, please jump directly into the capital letters TL;DR and read the bold text below it.
I’m working in a game with a world map (similar to Risk or the Paradox games) where each tile or province is represented by a unique color that serves as it’s own ID, eg: Color(255,0,0) is ID 1, Color(254,0,0) is ID 2 …
This method allows me to represent up to 256 * 256 * 256 different provinces in game (more if we consider alpha but three channels is already unbereable ).
When a user touches a province, I use a lookup texture which stores the provinces shapes colored with their unique color. This texture helps to identify with which province the player is interacting with.
Afterwards, with a CS I write to an Index Texture of 4096 x 4096 (which is equal to 256 x 256 x 256) in which each pixel maps to a given province and, between other purposes, allows map shaders to know if a province should be highlighted or to which faction a province belongs, between other dynamic data.
Below I will explain my current procedure in updating such Index texture, I would like to note that, in a real case, I would keep the Index Texture with the smallest possible size to work for the current provinces and not keep such a massive texture, but the whole idea behind this is to push the limits for optimization and see what can be done.
The update of my Index Texture is done right now in a CS where I feed a buffer with modified data that will update it. This buffer will be kept small (hardly expecting above 200 elements at any time) and include two integers per element with encoded data each one, including the pixel it’s modifying. The CS, for each pixel of the texture, loops across this buffer and, if there is data for it, copies it and finishes.
Now into the questions. I’m thinking of ways of making this process as performant as possible, but I lack some knowledge about CS inner workings and Unity’s handling of them so maybe most of them are useless, I ask for help if possible:
First thing would be to keep the texture small (but we aren’t doing it in this example of course).
Second one could be to execute the CS in small steps (three/four times per second at most), instead of each frame. I feel this may improve performance but I’m not completely sure. Does calling CS with Dispatch() blocks the whole thread and nothing else is executed afterwards until the CS ends, or does the thread keep working and synchronization is done at the end of the engine’s update loop?
Note that I’m never reading from the CS, only writing to it some small buffers.
If it doesn’t get blocked when Dispatch is called, maybe it’s much better to keep calling it each frame instead to keep things updated and avoid any possible lag spike.
Third option is to dispatch multiple CS, each one of them responsible of updating a smaller Index texture (for simplicity, instead of a 4096 x 4096 image, we are updating two 2048 x 2048 images, each one of them with their corresponding buffer (which would be smaller, although I feel this won’t be the problem). Nevertheless, I’m interesting in knowing, would running multiple CS at the same be possible, or will the CPU thread in which are being called be blocked until the last one?
Does anyone has any ideas on whether there is a better approach to keep this whole process well optimized. Or do you thing it will already run great (I have still not tested it as I have not had the time to get into coding it right now and I prefer to have all the doubts cleared before jumping into writing code).
Thank you very much
TL;DR
I’m updating a big 4096 x 4096 texture (don’t want to know why, dangerous stuff). I use a CS that loops across a small buffer with around 200 elements in average, these elements being structs of two integers. This CS writes the data of any of these elements to a given pixel if a certain non important condition is met. I only write some small buffers to this CS, I never read from it.
I want to know two things:
Does calling CS with Dispatch() blocks the whole thread and nothing else is executed afterwards until the CS ends, or does the thread keep working and synchronization is done at the end of the engine’s update loop?
Can multiple CS be run at the same time, or will the execution thread in which Dispatch is called block until the first one is done.