Update: To anyone reading first time - I don’t do 27 at the same time, I’ve tried a maximum of 3 at the same time.
So I have a voxel world, and I’m using Loom to launch threads and fill chunks of my world with block data.
Doing this without threads (ie. on the main thread) takes 6 seconds to fill the active area with terrain block data. (And of course the game locks up during this process)
Here’s how I do it with threads - there are 27 chunks, and for each chunk, I launch a thread where it gets filled with block data.
When I allow 1 thread to run at a time (i.e. on chunk after another) the game is completely smooth, the chunks with the most blocks take about 2 seconds, but the rest take zero seconds because they’re empty. And the process takes 13 seconds. Why longer??
I thought the point of multi-threading was to take advantage of multiple CPU cores? (Of which I have 4)
When I allow 2 threads to run at a time (i.e. 2 chunks will be filled at a time), the process takes 19 seconds!!
And when I allow 3 threads to run (so a total of 4, including the main thread) the process takes 31 seconds!!!
I’ve read that multiple threads can cause a performance decrease (when timed with the ‘wall clock’, i.e. real world time), when we’re doing;
- CPU intensive tasks
- Heavy memory access
both of which this process does qualify for.
So is this normally the case in this situation??
Is it possible to get it done in 6 second again? Or do I just treat threads as a the option to have a 13 second load time with the game still playable, instead of 6 seconds of locking up the game?
I think the heavy memory access might be the problem here. Also you are creating 27 threads?
http://www.mono-project.com/ThreadsBeginnersGuide#Data_Races
I assume you’re generating the chunks procedurally rather than loading data off disc. Is this the case?
It’s possible the threads are not running on separate hardware threads but just filling in idle time on the main thread. Can you verify if this is the case?
To reduce the total time taken you should only create as many worker threads as there are extra hardware threads and ensure that each worker thread runs on a separate hardware thread.
@hippocoder - nope, I control how many threads run simultaneously. When I allow 1 thread to be created, that means the main thread is running, plus one extra thread for chunk generation, which will generate one chunk at a time. When I allow 2 threads, 2 chunks will be generated at a time, and so on.
It’s definitely not data racing since each thread is creating a totally new separate chunk, each representing a unique and separate part of the world.
@Gibbonator: Yes I’m generating them with Unity’s Perlin function (not loading off disk).
When I view the core graph in the Task Manager, it seems as expected - As I add more threads, I see activity on more cores. So if more cores are doing work, why is it taking longer the more threads I add??
Unfortunately Loom doesn’t give me control on what hardware core the threads run on (as far as I know)…
With 4 cores and a decent machine, your threads should be able to eat that stuff up. I suspect there is something else going on here. Like other mentioned…27 threads is a lot, that would hurt more than help. You should try limiting it to 4 or less, and see how that works out.
If you try to use more threads than the hardware actually supports, it performs a context switch which is expensive.
Also…can you use the stopwatch class to time how long the code takes to generate a chunk’s worth of data NOT multithreaded, but just the main function that does this. Basically, if you didn’t use loom or multithreading, how long would it take your code to generate a chunk of data.
If it takes like…3 ms, then you know your problem is not the time it takes to generate a chunk’s worth of data, and must be elsewhere. And you are just talking about the chunk data right? Not lighting, the actual mesh.
I vaguely recall Loom…threading helper, right? With minepackage I queued up the chunks to draw using a queue, had a thread that monitored the queue, processed them, and resubmitted them to another queue for the update function to draw.
I don’t do 27 at the same time, I’ve tried a maximum of 3 at the same time. The times I quoted in my first post should answer your questions about the timing.
6 seconds for a single chunk without threading, then 13 seconds for a single chunk on one thread, while the main game runs concurrently on the main thread.
What could be happening?
Chunk generation is just the setting of values in a 3D byte array. I’m not even timing mesh generation, which is a separate pass.
And what I’ve got going on is pretty much exactly what you described in your minepackage.
For the best performance you should have one thread running on each processor core but you still need to manage them and check whether they need to do more work. When working with threads there often happens to come race-conditions which is a big no no! the way the CPU works is they only execute one thing at a time, a single core NEVER runs 2 thing at the exact same time but because the CPU is so fast now days we as user never notice this. To show you have it work refer to the badly made ascii-art below;
CPU-Core-1
Thread 1: – — ------------
Thread 2: ---- -------- -----------
Thread 3: — ---------
Imagine Each “-” to be a single clock cycle, you can then see that thread does not necessary make things faster.
What you possible want to do is to have one single thread manage all your chunks and make sure only that thread manage the chunks to avoid race conditions. Also if you want the absolutely best performance you should look into GPU Programming which is roughly 20% faster minimum.