I understand that this will be mitigated by the Burst compiler in a Jobs situation, but Iād be interested in hearing peopleās thoughts on it. Any insight as to why? Is it a beta thing that can be expected to go away?
Thanks for the clarification! With āthis should not be doneā, do you mean that itās bad practice and I shouldnāt do it, or that the overhead ought to disappear when I build the project? Sorry for being dumb ~
Sure, makes sense! But the ultimate point is performance, right ⦠Iām having trouble actually getting any performance gains out of the jobs system, because Iām bottle-necked by shuffling my data to a nativearray and back again. My other idea was to just keep it in a nativearray to begin with, but that got me here - basically all the logic interacting with that nativearray becomes very slow. So end of the day Iām not really able to get any performance out of the system.
Perhaps my situation is just not what the job system is for.
Thatās because Unity isnāt ready yet to use pure ECS effectiveness. We still need to convert from and back to managed objects in order to have some basic functionality, like rendering, sound, camera, etc. This should be changed as the Unity team releases new features compatible with Job System, ECS and Burst compiler.
Also, NativeArrays have a lot of checks to guarantee some safety to us. This slow down the access, but should be removed whe you build for production, outside the Editor, as @LennartJohansen pointed out.
Perhaps I should just wait then, and try to solve my problems with old school c# threading in the meantime. Thereās a lot of potential to the jobs but it might be that itās not really viable outside of specific test scenarios just yet!
As for the safety checks, itās of course good news that itād be faster in the build, but you still have to have decent performance in the editor to be able to work on the projectā¦
Donāt pay attention to the difference between read and write, when I rerun the test sometimes read is faster sometimes write is faster. I donāt know why. But RW should even everything out.
So
Safety checks in editor adds 10 times performance hit.
In a job it performs better than out of a job. It still lose to array outside of a job. It looked better in a job probably because locality of data. (regular array has its content on heap even if allocated in a local scope and native array out of the job maybe requires some pointer dereferencing, where in the job I think Unity made it more direct and more local)
Safety checks in the editor have a significant performance cost.
The safety checks are disabled in the standalone player completely and in IL2CPP there is a fast path making builtin arrays and NativeArrays equally fast.
The real performance gains of NativeArray are leveraged from the burst compiler, when writing primarily jobified code with the [ComputeOptimization] attribute. We expect that for any code that is performance sensitive that developers will write it to run in a job in burst.
In burst the speed gains from using NativeArray are very significant. Usually on the order of 5-15x compared to il2cpp / mono.
Iām finding a similar slowdown. e.g. see Texture2D.GetRawTextureData() ⦠when you deal with it as a generic type and return the native array, accessing the pixel data directly is HUGELY slower than accessing a regular byte array. Like 5seconds verses 35 seconds. This unfortunately seems to render the function practically useless for anything other than convenience and is probably even slower than just using GetPixels()/SetPixels() to make a copy of the pixel data. This seems a bit silly to me. I was hoping to see a much faster way to edit pixel data in the system memory without having to involve a copy operation on the whole buffer. But the performance is really bad.
I am interested in this new functionality, too (I.e. get pixel data as native array) - when you say no speed gains, did you follow what was said by Joachim in the post before yours?
I will only get to it in 10 days when i return. If you find something out earlier it would be great if you could share
Why do you have to add an attribute to enable burst compilation? I might have missed something, but I canāt see any reason to not turn on burst if itās possible to do so. If there exists corner cases where it should be disabled, I think it makes more sense for there to be a [DisableComputeOptimization] attribute for those cases rather than the other way around.
Or is this simply something you have planned, but not gotten around to yet?
If someone uses something like nativearrayutility.getunsafeptr() and uses unsafe (pointers-are-ok) code⦠would this present a lower-level access to the internal ārealā memory buffer (like struct data) of the native array, so that when you access elements like in an array, there is none of the āoverheadā of native arrays, and it performs at maximum access speeds (like it would when using pointers on regular memory buffers?) or will there still be some kind of behind the scenes middle-man kind of activity interpreting all the accesses? ie can you do away with all the bounds checking and interfaces and so on and just get āfull speedā access to the memory this way, even in the editor and without burst-stuff or trying to write jobs?
And if so, how would you pin or āfixā the memory so that the garbage collector doesnāt try to move it?
So if you actually care about performance you will use NativeArray together with burst jobs.
In IL2CPP performance of builtin array and NativeArray is on par, in mono in a build, NativeArray is slower than builtin array.
So as a simple rule, just use NativeArray + Burst jobs and you will get the best possible performance.
In Burst using NativeArray is faster than GetUnsafePtr() because we can guarantee aliasing rules.
If I recall about jobs thereās not really a āsafeā way to have multiple jobs/threads literally share access to the same memory buffer, ie, potential of reading and/or writing to the same byte in the same buffer āunsafelyā⦠which I actually want to be able to do in my case.
Performance tests so far report that the UnsafePtr approach is waaay faster and pretty much the same performance as a regular int[ ] array access. Typically the native array is coming in around 7 times slower (in editor), while the unsafe ptr version is about the same āfullā speed as a normal array access.
These two return different addresses. I presume the first is the address of āobjectā (which contains data and a pointer to some memory), while the seconds returns the address of the actual memory? Do you think this guarantees that the pointer to the actual memory is also fixed because the object itself is fixed? Or does the UnsafePtr internally fix the memory behind the scenes?
Iām also not sure if what Iām pinning here is the actual native array of just my reference to it?
(also should I use .SetAtomicSafetyHandle() for some reason to lock it down further or is that just to get some kind of ownership over it so that other methods etc can ask whether itās okay to also access the data at the same time, or?)
(note to self, objects larger than 85000 bytes are put on the large object heap and wonāt be moved by the garbage collector).
Hey @Joachim_Ante_1 while Iām here⦠is there any reason why Unity still doesnāt let you Apply() a texture with a rect? In my case I am having to split up a large texture into many smaller textures in order to avoid having to push the entire large texture to graphics memory each frame. If Texture2D.Apply() would simply take a rect like Apply(new rect(0,0,64,64)); to apply only a small portion of the texture over the graphics bus, I would be able to just use one large texture and deal with which parts of it need uploading myself. This would open up a world of better performance for me, plus vastly less draw calls because splitting up a big texture into small ones ramps up the draw calls pretty fast to get the upload sizes to be smaller.
e.g in OpenGL v1 there was the instruction glCopyTexSubImage2D() which does exactly this. Any chance we can get a version of Apply() thatāll take a rect and only upload the part of the texture within the rectangle?
@imaginaryhuman_1 you donāt seem to be testing with burst, its not very relevant to discuss what the performance in the editor with mono is. Its not relevant for any code you want in a job.
PinGCObjectAndGetAddress is not how this is meant to be used.
Please follow the standard way Burst / C# jobs / NativeArray is supposed to be used together.
By default that is not allowed. There are various attributes to opt out and allow behaviour that is not provably safe & deterministic.
NativeDisableParallelForRestrictionAttribute, NativeDisableContainerSafetyRestrictionAttribute, NativeDisableUnsafePtrRestrictionAttribute can be used on containers on jobs to circumvent safety mechanisms.
When following the recommended path it is very rare that you need those.
Are you saying then that if I use the job system, I should be able to access the native array without any hacking around and read/write from/to it for massively intensive āpixelā operations, and it will run at the same performance level as the GetUnsafePtr() method ⦠but with that perofrmance level in the build, not in the editor?
And if I switch over to using jobs then, how would I mark the Unity-created native array from a Texture2D, as having a disabled safety restriction so I can deal with the race conditions āmanuallyā?
Youāre winning me over to the job system but I need clarification that Iām not going to see LESS performance as a result of working with the Textureās native array this way, and seeing it run so much slower in the editor is a bit offputting.