[Solved] C# Job System VS Managed threaded code

Short question
What is best way how to execute expensive code which requires usage of managed code while the game uses C# Job System?

Little bit elaborated
Let’s say our game needs to execute quite expensive calculation which are not (realistically) possible to write for C# job system (huge cryptography work, image processing using 3rd-party libraries, generally any work which can be executed in separated thread and is too expensive to rewrite using only blittable types). How can be this done while extensively using job system?

More specific example
To be little bit more specific: let’s say I have huge amount of IJob instances and similarly big number of IManagedJob (my own interface) instances. Each of them can be executed in separate thread, some of them must be completed in current frame, others can take several frames. I can see multiple ways to handle this problem, but none seems to be optimal:

  • Use C# job system AND my custom managed job system at the same time.

  • Advantages: it’s simple to write the code.

  • Disadvantages: if my managed job system uses (core count - 1) threads, then there are (core count - 1)×2 threads in total, which makes whole system ineffective (context switching)

  • Ignore C# job system and run IJob instances in my managed job system

  • Advantages: optimal thread count

  • Disadvantages: losing burst optimizations, impossible to use ECS

  • First schedule jobs from one system, when completed schedule jobs from the other

  • Advantages: (almost) no overhead from context switching

  • Disadvantages: requires precise timing in main thread when to schedule other jobs, there will always be some overhead from context switching due to jobs which lasts several frames

  • Use smaller thread count in my managed job system

  • Advantages: less overhead from context switching

  • Disadvantages: depending on managed tasks, whole system might be the least effective variant (if there is too many managed jobs)

Real-world problem
If you want to ask what real-world problem I’m solving: I joined certain project which is in the middle of progress and it uses own managed job system. My task is to implement something which could hugely benefit from C# job system (lots of low-level calculations on arrays), but rewriting their managed jobs to C# job system is not possible due to time and financial constraints.

Hypothetical problem
After my experiences in the industry, I’m 99% sure that there will always be jobs which requires usage of managed code. I cannot imagine how to write such code while using ECS in new projects.

You could use the low level API’s to make your own job type.

https://github.com/Unity-Technologies/EntityComponentSystemSamples/blob/master/Documentation/content/custom_job_types.md

A simple way would be to use GCHandle to keep a reference to a managed object, then on the custom job convert it to the required interface and invoke some class or whatever you need to run your non-burst C# code.

In this setup you lose

  • Safety system, you have full access to C#, any memory and there will be nothing to report race conditions
  • Burst can’t be used

But at least you can have jobs with dependencies on other jobs.

So i would suggest keeping as much of the code in NativeContainers + real C# jobs. And then the code that can’t be ported because you have dependencies on code you don’t control keep that in those GC handle based custom managed job types.

Optimally of course over time you find a way to convert all of the C# code to be burst compliant with good data layout. So that you actually get speedups beyond just running the code in parallel.

2 Likes

I see, thank you very much, I’ll check that. It seems like the solution I was looking for.

Even if I don’t take into account things I cannot control (in our development)… I really don’t think I’m capable of doing that in reasonable time and quality for such huge codebase. I mean, converting “all of the C# code to be burst compliant” means losing a lots of OOP features and almost feels like switching back to very restricted subset of C.

So I tried to use GCHandle and found a working solution, so I post it here for the reference and (potential) review;

interface ITask
{
    void Execute();
}

struct Job : IJob
{
    public GCHandle _Task;

    public void Execute()
    {
        ITask task = (ITask)_Task.Target;
        task.Execute();
    }
}


// Get instance of task to be executed
ITask task = GetTaskInstance();

// Creates native reference of the task. Note: if there will be problems with
// instance of GCHandle in future, it can be converted to/from IntPtr.
GCHandle taskHandle = GCHandle.Alloc(task);

// Schedule jobs
Job job = new Job()
{
    _Task = taskHandle,
};

JobHandle jobHandle = job.Schedule();

// ...

jobHandle.Complete();

// Release native reference of the object
taskHandle.Free();

As I thought, in real-world project, this is pretty much out of the question…

I’m sorry for bothering you again. I just would like to ask/confirm: is using GCHandle and usage of managed code in job’s Execute function (not in member fields) in jobs without Burst attribute going to be officially supported in future? For example, AFAIK you’re planning to block usage of static variables/methods from jobs with static analysis - is there possibility that you are going to block managed code in jobs without Burst attribute?

We will perform static analysis but as with all burst related features our principles are “Performance and safety by default”. Its on by default but you can disable it where necessary.

An example of current approach is this:

[NativeDisableParallelForRestriction]
NativeArray<int> indexList;
or
[NativeDisableUnsafePtrRestrictionAttribute]
MyPointer* pointer;

which allows you to use unsafe pointers on a job, clearly thats unsafe and not a great default to allow. So our approach is that when choosing things that aren’t provably safe you have to manually specify that this is what you intended. At that point you are on your own to shoot yourself in the foot :slight_smile:

The same similar principles will apply for static analysis.

1 Like

Also do note that GCHandle approach is used in ECS for streaming scenes right now. Turns out you need a string for that…

2 Likes

Thank you for your answer.

(I interpret it that we will be always able to call manged code from jobs (without Burst attribute) and if there will be static analysis used for blocking such code, we will be given an attribute to disable this safety per case.)

Yes.

2 Likes

In your context without Burst speedup, i don’t see any advantage of Unity Job System over native c# one or i am wrong?

There are advantages;

  • When I use Unity C# Job System, I don’t need to spawn own thread. Thanks to that, there will be always running (C-1) threads (C = core count) so the system isn’t slow down by context switching.

  • When I have big task and part of it can be written in burst-optimized code. In other words there is a task T which can be separated into T1 (managed code), T2 (burst-optimized code) and T3 (managed code).

  • If I don’t use C# Job System for T1 and T2, then I schedule T1 in my custom thread. Wait until it’s finished (which can be checked in next frame). When it’s done, take it’s results and schedule T2 in job system. Again, wait for results - check every frame - and when it’s done, finally schedule T3 in my custom thread.

  • If I do use C# Job System for everything, I schedule T1, T2 and T3 and the same time as jobs using dependency system (T3 depends on T2 which depends on T1). Then I just wait for result of whole T. So thanks to this approach, I avoided delays before scheduling T2 and T3.

So even thought using managed C# code in Unity C# Job System isn’t faster (because it doesn’t use Burst compiler), it’s faster in the context of whole application.

  • If i’m not mistaken, you can configure the worker thread count
  • In your example, T1, T3 could be implemented with c# native task. It seems to me that it’s faster and a lot easier to integrate with existed code.

AFAIK I can’t. AFAIK, that’s internally handled by Unity. (If you mean Unity C# Job System’s worker threads.)

Easier, yes. Faster, it depends. In terms of individual tasks, yes, it would be faster because there wouldn’t be overhead of Unity C# Job System and allocating unmanaged reference. However, time since start of T1 to end of T3 will be greater due to delays between scheduling each sub-task.

Example time. For simplicity, let’s say we target 30 fps, so we have 33ms per frame. And let’s say each sub-task (T1, T2, T3) takes 10ms.

When managed code is executed from non-job threads:

  • Frame #1: schedule T1. It runs on my thread, takes 10ms, rest of the time the thread sleeps.
  • Frame #2: main thread fetches result of T1 and schedules T2. It runs on Unity job system worker thread, takes 10ms, rest of the time the thread sleeps.
  • Frame #3: main thread fetches result of T2 and schedules T3. It runs on my thread, takes 10ms, rest of the time thread sleeps.
  • Frame #4: main thread fetches result of T3.

When managed code is executed using job threads:

  • Frame #1: schedule T1, T2 and T3 using dependency system. Unity job system worker thread: executes T1 in 10ms. Immediately it can start T2 which also takes 10ms. When it’s finished, it starts T3. Together, it took 30ms.

  • Frame #2: fetch results of T3.

If IJob.Schedule() can only be called from main thread then you are right. Once we start messing with Unity, we have to stick with it :slight_smile:

Exactly

There is a work-around : you can use all Unity hook points to schedule T2 and T3 like Update, LateUpdate, OnRenderObject, OnPreRender, WaitEndOfFrame. That will provide finer grain than one frame delay.

Or better if you know T1 only run in one frame, after schedule T1 in Update(), in OnRenderObject() you could use AutoResetEvent.Wait before schedule T2, and at the end of T1, you call AutoResetEvent.Set to unblock the main thread.

For me, using GCHandle, is the best one.

  • I can schedule all the steps at once at one point instead of tracking it all around code.
  • Work-around using all Unity hooks shortens the delay, but it’s still bigger than when worker thread starts next step immediately. I don’t think you can top that.
  • T1 running one frame - it was just an example trying to explain that scheduling manually in main thread will always, always add some delay. The duration of job is really unpredictable and different on each machine, so you cannot make any kind of assumptions (like “it will be finished next frame”).
  • IMHO work-around you suggest adds also more complexity in the code.

I like the GCHandle approach too. The only drawback I see is the GC Alloc it causes.

In Aithoneku’s code, I see a mention to switch it to an IntPtr. How would you use it then ?

I’m trying to use UnsafeUtility.PinGCObjectAndGetAddress instead, which gives you a ulong handle instead of an actual GCHandle. I’m not sure how to convert either the pointer or the ulong handle to the managed object…

You can always use object pooling - the pool could use a structure which keeps both managed object and it’s GCHandle. Of course you still need handle deallocating the memory on appropriate time.

My comment says “if there will be problems with instance of GCHandle in future, it can be converted to/from IntPtr” - the point is, that I wasn’t sure whether there will be some problems with using instances of GCHandle with Job system and I noticed methods ToIntPtr and FromIntPtr - in both cases you simply store unsafe pointer in a structure.

Well, I can see ways to convert between them, but I don’t know whether the following is safe. I don’t know unsafe part of C# enough to know the dangers of following!

IntPtr is just a container of pointer. First, I can see there are constructors for IntPtr taking void pointer and long integer. It’s not ulong and I don’t know whether it’s ok in this case to cast between them. Then, you can convert it to GCHandle. But as I wrote, I don’t know whether it’s safe. All I can do is recommend to study how these things work.

But what can I write further is very important rule: if you allocate something, anything, you should deallocate it by same source (class, unit, etc.). So if you allocate some handle with UnsafeUtility.PinGCObjectAndGetAddress, deallocate it only with UnsafeUtility.ReleaseGCObject (and don’t use, for example, GCHandle.Free). This rule should be kept even when you switch your system, programming/scripting language, etc. Behavior of such action is undefined (unless explicitly stated)! That means “it might work now, it might work next week, but nothing guarantees that it will work later or when you change configuration (debug/release) or system or anything else”.