Please help solve this "multiple producers" code problem.

  • Imagine a game in which farmers are harvesting apples from an orchard.
  • When each farmer picks an apple, they must add it to a specific bucket, based on it’s type, condition, ripeness, etc.
  • There are a variable number of farmers, and a variable number of buckets. Both counts could be very high.
  • Each farmer is represented by an Entity. Each bucket is also represented by an entity.
  • The “farmers picking apples” takes place over multiple, unrelated jobs. This cannot change.
  • Later in the game loop, other jobs might iterate over all the apples in a specific bucket.

Please help me figure out a performant way to do this, so that multiple jobs can add apples to multiple buckets, in parallel.

I’ve been struggling to figure out a reasonably fast solution. The main problem is usually multiple farmers adding to a single bucket in parallel.

Thank you for any suggestions!

1 Like

This might be a use case where NativeMultiHashMap makes sense.

Otherwise the pattern you’ll likely have is

  1. Write out apple-bucket pairs in separated containers (NativeStream or dynamic bufffers)
  2. Count entries in containers
  3. Prefix sum counts to build packed array index ranges
  4. Copy to packed array
  5. Sort packed array by bucket
  6. Count ranges of buckets
2 Likes

This problem of “multiple producers” is something I’ve also had trouble tackling and I suspect that many people getting into ECS also eventually run into this problem. Unfortunately, I haven’t found any presentation or documentation from Unity that answers this question, as far as I can tell, we’re left to figure this out on our own.

The way I ended up handling this way by using the event system provided by the wonderful Tertle . To use your example of farmers and apples, I produce an event whenever a farmer adds an apple to a bucket. I then sort these events by the bucket that they add to. I do this to ensure that no two jobs add to the same bucket. Lastly, I consume the tasks in parallel jobs which modify the component values of the bucket entities.

While this allows the modifications to be done in parallel the additional step of sorting and scheduling jobs to execute those modifications can be expensive. I believe that in the vast majority of cases it is cheaper to omit the sorting step and just consume those events in a single threaded job.

1 Like

I looked into a solution similar to tertle’s event approach. The problem (As you commented) was that giant sorting step. This needs to work with 1-n producers, and 1-n buckets. Iterating over hundreds of thousands of apples, just to sort them into the correct buckets is something I’m quite hesitant to do.

Maybe the hardest problem with that approach is: figuring out the correct sizes for the buckets, before adding apples to them. If the information about which apples go in which bucket is embedded in the events themselves, then you’d have to iterate over those events at least once, just to figure out how large each bucket needs to be.

(Very glad to be wrong about any of this - please correct me. :slight_smile: )

you know, I had been avoiding this approach because for some reason I thought it might be considered crude. But you know - anything that might work is worth investigating! It is encouraging to hear someone like you suggest it.

Thank you both very much for you tips. I really appreciate your help. :slight_smile: I’ve been banging my head against this one for a while, with no winning solutions.

1 Like

My event system actually has an extension method specifically for this called EnsureHashMapCapacity that’ll resize a hashmap to ensure its capacity fits the number of events. And if you don’t want to use a hashmap can just use the GetEventCount extension. [it’s pretty fast, you don’t actually need to iterate the events to get the count, it’s counted on adding]

(not saying my event system is suited for this specific problem, I haven’t read or thought about it enough)

2 Likes

Is it possible for you to set a maximum capacity for a bucket, such that if you run out of space in a bucket, you just ignore any further apples that should go there?

Also, will any two farmers attempt to pick the same apple?

2 Likes

I usually avoid NativeMultiHashMap.ParallelWriter because of determinism/pre-allocated sizing issues.

I guess the biggest unknown for me is what your producer jobs look like. If every producer could get its own NativeMultiHashMap (non-parallel) that it can write to, you can do the scattering in a single-threaded context (with multiple contexts running parallel to each other), and then you can run a parallel algorithm to merge the containers.

Otherwise, if you decide to merge the producers and then shuffle, a counting sort works really well for this kind of problem.

2 Likes

unfortunately, no. No apple can be left behind.

normally in this case, I would just calculate the max number of apples which can be picked per farmer, per update. Then I would resize each bucket buffer to be long enough for the max apples of all current farmers.

but in this case, there could be so many farmers, that I’m not sure that’s feasible. You could end up with thousands of massive buffers, which might each be mostly empty.

thankfully, no! All apples picked are unique.

I should say that the Apple example is a stand-in for another problem. Due to NDAs, I sadly cannot post the other case. :frowning: But trust me, this is a 1-1 translation. Solving the Apple problem will solve the other case.

It is a hard problem. I promise I’m not being difficult for the sake of difficulty. :slight_smile: Thank you very much for your replies.

1 Like

You said “Later in the game loop, other jobs might iterate over all the apples in a specific bucket.”

Is this the primary problem being solved? i.e. what you’re really trying to accomplish here is a way for later systems in the frame to efficiently retrieve the apples that were in a particular bucket. Or does the act of bucketing the apples themselves serve some important purpose?

And you said “might iterate” - does this mean that some buckets, having been produced, do not actually get used?

Is there frame coherence to consider here - i.e. might the same apples exist the next frame, and if so, will they mostly sort to the same buckets?

2 Likes

Thank you so much for responding:

Yes, you’ve nailed it. However, for the jobs which might access these buckets later, order may be important. I may need to sort the apples, before they are processed in later jobs.

Yes. Some buckets might never be used, after they are populated. In this case, they will still be cleared before the next round of apples is added.

Thankfully, no. Apples will never last more than one frame.

[quote=“PublicEnumE, post:10, topic: 800760, username:PublicEnumE”]
Yes, you’ve nailed it. However, for the jobs which might access these buckets later, order may be important. I may need to sort the apples, before they are processed in later jobs.
[/quote]How do you know whether you will/won’t need the apples to be sorted? (Is it dependent on the bucket?) What % of buckets do you estimate will need to be sorted?

[quote]
Some buckets might never be used, after they are populated. In this case, they will still be cleared before the next round of apples is added.
[/quote]What % of the time do you think this will happen?

Also, for the buckets that do get used: how many downstream jobs will need the contents of a specific bucket? Will an individual bucket be read by many jobs, or will each bucket only really be read once?

What I’m wondering about is how much you actually need to be generating these extracted lists per-bucket. Your farmers are basically doing “compute which bucket this apple goes into” - what if, instead of storing the apple in a per-bucket buffer, they just labelled the apple with the result?

You could have the farmers maintain two further pieces of information to help:

  • You could have a single shared buffer of bucket counts (i.e. NativeArray of length numBuckets). All entries are zero at the beginning of the frame, and as each farmer computes the result for an apple, they use Interlocked.Increment to increase the counter for the given bucket.
  • You could use Chunk component data to store something like a bloom filter of which buckets are used by the apples in this chunk. You can trade off the accuracy of the filter against the memory used per chunk.

After all the farmers have run, you’d have an accurate per-bucket count, as well as the ability to iterate over the apples and find the members of any given bucket (skipping chunks that definitely don’t have apples in the given bucket). I’m assuming that your downstream jobs will be reading the data from the apples anyway, so you’re going to be pulling those chunks into the cache no matter what.

And for the trickiest case, where you need the contents of a bucket to be sorted - I think for that you would still need to allocate a buffer, but doing it after the farmers have completed, you’d know the exact size of the buffer you have to allocate, and you have the chunk filter to help you extract the apples quickly (you could compute a sort key at the same time). Then after extracting the apples, you sort them, ready for downstream jobs to use.

1 Like