IJobParallelForTransform, 15000 transforms, executed on single job thread, any hints?

Hi!

I’m playing with Job System, implemented simple scenario where cubes are falling down and being respawned once they hit a certain Y level. I have two jobs, the second dependent on the first. They both are referencing the same native array for transforms. It runs without errors/warnings but the job is never split among job threads.

Job’s code:

namespace Ships.JobSystem
{
    [BurstCompile]
    public struct MoveJob : IJobParallelForTransform
    {       
        public float speed;
        public float deltaTime;

        public void Execute(int index, TransformAccess transform)
        {
            Vector3 position = transform.position;

            position = position + new Vector3(0, speed, 0) * deltaTime;

            transform.position = position;
        }
    }
}

Profiler as the proof (development build, not taken from the editor):

Stats window:

I’m running

Version 2018.2.0b8 (fed204371f5a)
Wed, 30 May 2018 15:35:38 GMT
Branch: 2018.2/staging

My project manifest:

{
    "dependencies": {
        "com.unity.modules.ui": "1.0.0",
        "com.unity.modules.tilemap": "1.0.0",
        "com.unity.modules.physics2d": "1.0.0",
        "com.unity.modules.assetbundle": "1.0.0",
        "com.unity.modules.unitywebrequestassetbundle": "1.0.0",
        "com.unity.modules.unityanalytics": "1.0.0",
        "com.unity.modules.umbra": "1.0.0",
        "com.unity.analytics": "2.0.16",
        "com.unity.modules.vehicles": "1.0.0",
        "com.unity.ads": "2.0.8",
        "com.unity.modules.imageconversion": "1.0.0",
        "com.unity.modules.director": "1.0.0",
        "com.unity.modules.video": "1.0.0",
        "com.unity.modules.audio": "1.0.0",
        "com.unity.modules.unitywebrequest": "1.0.0",
        "com.unity.textmeshpro": "1.2.1",
        "com.unity.modules.ai": "1.0.0",
        "com.unity.modules.unitywebrequestwww": "1.0.0",
        "com.unity.purchasing": "2.0.1",
        "com.unity.modules.particlesystem": "1.0.0",
        "com.unity.standardevents": "1.0.13",
        "com.unity.modules.imgui": "1.0.0",
        "com.unity.modules.physics": "1.0.0",
        "com.unity.modules.screencapture": "1.0.0",
        "com.unity.modules.xr": "1.0.0",
        "com.unity.modules.terrain": "1.0.0",
        "com.unity.modules.unitywebrequestaudio": "1.0.0",
        "com.unity.modules.jsonserialize": "1.0.0",
        "com.unity.modules.terrainphysics": "1.0.0",
        "com.unity.entities": "0.0.12-preview.6",
        "com.unity.modules.animation": "1.0.0",
        "com.unity.package-manager-ui": "2.0.0-preview.3",
        "com.unity.modules.cloth": "1.0.0",
        "com.unity.modules.uielements": "1.0.0",
        "com.unity.modules.vr": "1.0.0",
        "com.unity.modules.unitywebrequesttexture": "1.0.0",
        "com.unity.modules.wind": "1.0.0",
        "com.unity.incrementalcompiler": "0.0.42-preview.1"
    },
    "registry": "https://packages.unity.com",
    "testables": [
        "com.unity.collections",
        "com.unity.entities",
        "com.unity.jobs"
    ]
}

Any suggestions what I might have missed to make my job be split among threads?

Thank you!
Jakub

2 Likes

Can it be that IJobParallelForTransform does not have an extension for setting the grouping?

I have to schedule it like this:

shipTransforms.moveJobHandle = moveJob.Schedule(shipTransforms.shipsAccess);

While the code within ShipMoveJob is small, it still takes 2ms at single thread so by using 7 threads it can get down to 0,28ms…

Jakub

IJobParallelForTransform only splits the roots. if all your transforms have the same parent, they will execute in the same thread.

try removing the parents for your cubes if you have any (or grouping them under different roots, at least >= the number of CPU threads)

12 Likes

M_R is correct, I’ve split my objects among different parents and it works like a charm.

2 Likes

Unity needs a concept of a static parent or decorative parent so we can still keep things organised.

12 Likes

Is not ECS supposed to solve these things?

No, ECS has nothing to do whether or not Unity has the concept of “decorative parent” in the Hierarchy window.

1 Like

Necrobump deluxe but I was thinking of the GameObject-conversion, anyhow problem solved.:sweat_smile:

[edit] Just noticed that the job specified by OP is also not vectorized.

1 Like

Hi. Can you provide an example of how OP’s code can be vectorized? The code is simple and I cannot find a way to optimize it even more. I want to write better code for my projects - that’s the reason why I’m asking :wink:

Also, I will be really appreciated it if you can provide here a few links to information about “vectorized code and how to write it”. I think you are more experienced in DOTS than I’m, so you can know good books or articles about the vectorized code.

Sorry, I cannot help. I am not an expert on the subject, I just read what the burst-inspector tells me…

1 Like

I’m one of the handful of people on these forums that could help you out.

However, keep in mind that you should only try to vectorize code that is measurably expensive. Otherwise it is a waste of time, because you can only get up to 4x speedup (8x if using AVX) unless you also fix other things related to aliasing, branching, and caching. Anyways, if you have a particular job that you measured to be too expensive and you would like to performance golf, either start a new thread and tag me or PM me directly if you don’t want to share it publicly.

1 Like

Use a float3 instead of a Vector3. Float3 (and everything in the Mathematics package) is designed to be SIMD compatible, which is essential for vectorization. Though ‘I think’ you can still use Vector3, as soon as you use the new keyword (which may also happen behind the scenes with Vector3) you are shooting yourself in the foot.

Regarding the original question, that IJobParallelForTransform doesn’t multithread properly, then I’m running into the same problem. When using IJobParallelForTransform I also doesn’t get any significant speedup, compared to running it on the main thread.

I’m scheduling the job like this:

m_PositionJobHandle = m_Job.Schedule(m_TransformsAccessArray);

I have tried to create 8 root objects, and place the spawned prefabs evenly under these transforms. But that didn’t improve performance either. Any other ideas why IJobParallelForTransform doesn’t properly utilize all cores?

Thank you very much in advance for all your help.

Kind regards,
Uffe Flarup

  1. Make sure your job is using Burst. You can check that in the profiler timeline view.
  2. Make sure the work you are doing in the job is more expensive than the work to gather inputs to schedule the job. Ideally you shouldn’t have to do any gathering of inputs, but I have seen some pretty awful attempts.
  3. Show code and profiler timeline.
1 Like

Thanks a lot for the input. After inspecting some more, I could see that the code actually WAS running on all cores, but the code itself wasn’t the big part of the work. Instead, after updating and moving the transforms, it was the subsequent calls to UpdateRendererBoundingVolumes that Unity automatically does, that’s taking most of the time.

3 Likes