I could guess that Burst works in a similar way to how native dll works with mono. If this is true, would be feasible to make the Burst attribute available for code even outside the Job system?
Similarly, could be possible to have thread safe API (transforms, raycast) that could work outside the job system?
Currently this mode is not available. A burst call does not work magically like a DllImport call. Internally it is requiring a delegate to call the generated native code. This delegate has to be created manually and has to match the original C# method signature that it was compiled from. So the process is still a bit cumbersome, error prone and not really user friendly to use it in C#. I would also be more in favor of integrating it with the Mono JIT and IL2CPP so that a tagged method like [BurstCompile] could be called directly without having to go through a delegate, but it makes the integration of burst a lot more tight to the way the JIT or IL2CPP are marshalling arguments, so this is not something easy to integrate nor portable… not sure we will go this path. Though we may expose the simple delegate method path for the preview 2018.2, stay tuned!
Thanks for the reply! It was a wild guest, but I still hope I could use it outside the job system
I have my own multi-threaded scheduling system, which is comparable to the JobSystem, therefore I would love to not be forced to use jobs if I don’t strictly see the benefits. Currently the benefits I see are not due to the Job interface, but to the fact I would like to use burst and the thread safe API. I understand that you want to make performance accessible to everyone, but you should not assume that very experienced people don’t use and love Unity (like I do). I believe c# is powerful enough, but I am limited by the unity API. Burst is also awesome and it would be a shame to limit it to jobs only. What I am trying to say is, you are doing a great job, don’t limit it to your vision only though, it could be counter productive.
It’s just a prototype test API for now but we used it to optimize some main thread code.
That said. There are a couple of reasons why using C# job system fully makes sense:
The safety system is the key feature and the close connection of Containers + C# Jobs is required to provide full race condition detection. I can’t stress enough how important a full safety system is when writing multi threaded code…
Having two job systems running one for engine code and others for your own code will result in worse frame-rate. Switching thread context is expensive and two job systems fighting for the CPU will not help on that.
We are very invested in making the C# Job system the absolute fastest and most powerful job system.
That said I would really love to hear from you trying the Unity Job system and see what we need to improve to solve for all the uses you have that you feel you might not be getting from the C# Job system.
it’s great seeing you reply on the forum wow! I tried it for my One Million Points on CPU demo and it works well. I noticed that the IL2CPP needs some optimizations, but I think you are aware of it. I also 100% understand your goals for the Job System and I agree with the objectives, I am just saying that for medium/big teams with custom solutions there could be already alternatives. Our solutions currently runs at the same speed of the Job System in my demo, faster if compiled with IL2CPP.
The only concern I wrote in my article about the the Job System is that this solution would be optimal in a totally multi-threaded environment, but as long as a lot of stuff still happens on the main thread, triggering a burst of parallel tasks is less beneficial than running the tasks while the main thread does other things. Obviously I understand that stalling the main thread is necessary for safety concerns.
Again I love what you are doing, no critics at all, I just would also love to be able to use all this stuff outside the job system too. I will test the prototype API asap!
Our solutions currently runs at the same speed of the Job System in my demo, faster if compiled with IL2CPP.
Is it possible to create a benchmark for it? If there is any situation where the Job System is not the fastest solution around we want to address that. Is it scheduling overhead or NativeArray vs builtin C# array performance?
A benchmark comparing it would be really great to get access to and for us to review.
I think in practice the most important thing to benchmark is using it via Burst. NativeArrays in particularly are really a perfect fit on Burst and ultimatily I expect that most C# code that needs to run fast will run in Burst.
Not sure to follow exactly the issue about the stalls. You create a stalls with the Unity Job System if you perform an explicit stall by calling job.Complete(), but otherwise, nothing stops you to work on something else on the main thread and later when you really need the results, perform a job.Complete().
Somehow it seems that you imply that the way to use Unity Job System should look like this:
_job.Schedule(...);
// wait for complete - stalls the main thread!
_job.Complete();
// Perform other unrelated calculations
...
While you could mitigate stalls on the main thread by postponing the job.Complete():
_job.Schedule(...);
// Perform other unrelated calculations
...
// wait for the job to complete - may not stalls the main thread
_job.Complete();
Or even better, if you can afford it, you can use double buffering jobs (launch a job0 on one frame, wait the result of the previous job1 frame, next frame swap job0/job1…etc.)
Also, one of the potential reason the Unity Job System is slower than a custom C# task system is likely that the scheduler of the Unity Job System is written in C++ and when calling the job function, it has to perform a costly transition from un-managed to managed back to un-managed.
As Joachim suggested, where the Unity Job System shines is when it is used with burst: in that case, there are no managed/un-managed transitions, the Job threads are also not processed by the Mono GC unlike C# threads (so overall it will be faster on the GC pressure as well) and the codegen produced by burst can give a significant boost.
I converted your sample using burst (mainly using the new Unity.Mathematics and removing static variable access) and the Unity Job System with burst is roughly 2x times faster than your custom C# task system.
yes you are absolutely right. Thanks for enlightenment, I will profile it again.
To be honest, there is no way to know from the name of the functions when the jobs actually start. From the examples I saw so far, it seemed more reasonable that schedule was just preparing the jobs while Complete was starting and waiting for them.
Obviously now it makes more sense, but if you didn’t tell me, I wouldn’t have guessed.
About the performance, I also thought Unity Jobs could have been slower for that reason, but it is not. Currently Unity jobs is slower only if compiled in native code through IL2CPP, but this is a know problem by you.
Of course Burst is what I am looking forward for, my concern is about how to use Burst with IL2CPP. Once we move to 2018, I don’t see any reason to not use IL2CPP. I still expect burst to do a better job than the Microsoft compiler, even with all the optimizations enabled.
I will replicate your work for my new profiling, why did you need to remove the static access? I guess because burst doesn’t compile it right?
Indeed, static read access of readonly fields are not yet supported by burst. We hope that we will be able to bring them by 2018.2. Note that static read on non-readonly fields or static writes will not be possible (obviously also regarding thread safety), as it would require to access the “managed” .NET memory which is VM dependent (e.g Mono or .NET)
and the timings didn’t change compared to my article ones…while the main thread gets stuck for 10ms, the jobs should start, but it seems they don’t? According what you said they should. I mean they could have started (I cannot verify it), but in this case the timing should be 10ms faster.
You can set up many jobs with schedule, then you tell them to start with JobHandle.ScheduleBatchedJobs();
If you do not start them they will only start when you call complete
Schedule doesn’t actually schedule the jobs immediately, but add them to a queue. Jobs are scheduled when you call JobHandle.ScheduleBatchedJobs or JobHandle.Complete. This is done for performance reasons since scheduling individual jobs results in expensive Semaphore.Signal calls. By scheduling many jobs at the same time delayed this cost will instead be paid only once per ScheduleBatchedJobs calls.
Not sure to follow your question as well, but function pointers in burst are only callable from HPC#/burst jobs.
You can’t call any burst compiled HPC# code without going through a job.