Hello everyone,
With the prerelease out, it is time I write another one of these reviews. As a reminder:
This feedback is addressed specifically at you Unity DOTS Team. I don’t expect each of you to read all of this. But I hope you at least read the stuff relevant to each of your individual teams.
General Feeling
In previous posts, I described the experimental 1.0 release as “promising, but messy”. Well a lot of that “messy” got cleaned up in a very short amount of time. Now I would describe it as “almost excellent”. Much of this discussion is going to be me highlighting the few sour spots remaining.
The Engine
I get a lot of warnings about TempJob allocations without stack traces despite the fact I have stack traces enabled. I don’t know if the warnings are my fault or something in the engine.
I also find it really annoying that I can’t easily upgrade to the latest version of Burst. Burst is one of those packages that just keeps getting better every time.
Otherwise, while beta 13 was extremely broken for me, beta 16 seems to have ironed out a lot of the issues I encountered. I don’t stress the engine parts very hard though.
If job scheduling performance improved in 2022.2, I don’t see it, at least in the Editor. It is still way slower than I know is possible from other C++ solutions.
Mathematics
It still continues to be my favorite math library. And while there are small areas for improvement (I personally use a different technique for random numbers which works a lot better for ECSs), I honestly don’t mind the inactivity with development.
Collections
For the most part, I really like the design. Especially with containers being fully unmanaged in 2022.2, they have gotten really powerful and I can do a lot of really cool stuff with them with very few lines of code.
NativeArray.Dispose() should use a native job and not a C# job. Also, there doesn’t seem to be an equivalent for NativeArrays allocated with CollectionHelper. Speaking of which, why does CollectionHelper.DisposeNativeArray need the Allocator passed in as an argument? Shouldn’t that be stored in the NativeArray?
I found a bug with NativeHashMap (non-parallel) for a specific sequence. So that container does not seem to be water-tight.
Since my last review, I have come to discover AllocatorManager’s methods for custom containers. Custom containers aren’t documented correctly for use with custom allocators. You might want to fix that. However, I can say that AllocatorManager’s methods are pretty awesome to use.
Otherwise, here’s a mostly copy-and-paste of my previous complaints from 0.51 that are still complaints:
-
NativeStream has an element count, which is dependent on the number of Write calls when writing and is used to determine an expected number of Read calls when reading. However, because writes can be variable-sized, and it might be more optimal to bulk write but incrementally read, this falls apart fast. I wish there were byte counters instead.
-
Native{Parallel}MultiHashMap has a few performance pitfalls when working with unique keys. That should probably be fixed or at least better documented.
-
Can we please get a NativeArray constructor or factory method that let’s us define a more conservative alignment, such as aligning a float array to 16 bytes?
-
Unless MemCmp already does this, can we get a SIMD comparison method that compares raw bytes and returns a signed value, such that we can deterministically sort raw bytes of data quickly?
-
There is no per-thread rewindable allocator that is robust. Something close to it exists with ScratchPad. But we really need an allocator that can allow us to reuse memory per iteration in a job. Temp works fine with IJob. But when you try to use IJobFor or IJobEntity, it is too easy to burn through memory swapping cache lines, invoking fallback allocators, and increasing the overall RAM requirements just to have temporary buffers for intermediate transformations. You guys kinda designed yourself into a corner with this. Because shared statics can’t know which thread ID they exist in, it is too easy for someone to get an Allocator from the main thread and use that in a parallel job rather than one associated to a specific thread ID. And also Scratchpad has size limits. I want a multi-block per-thread allocator.
Overall, the Collections package is useful for all the common scenarios. I have used pretty much everything except my own fully custom allocators (instead I create RewindableAllocator instances) and some of the unsafe containers that don’t have “Native” equivalents. Besides the points I have already mentioned and a couple other API gaps I am forgetting about, the Collections package is excellent.
Jobs
Finally, you documented IJobFilter! I still haven’t found opportunities to use it to its fullest yet. IJobParallelForFilter and IJobParallelForBatch are still awesome. Though I wish you made their schedule methods ScheduleParallel and gave them single-threaded Schedule methods just like IJobFor.
I have a new much bigger complaint now. There are special rules for generic jobs that if followed allow Burst to discover them automatically. Unfortunately, the ILPostProcessor or whatever allows jobs to be invoked from Burst ISystems does not seem to acknowledge these rules. IN-19950 if you want to take a look.
I can forgive the naming issues, so other than the generics issue, I would say that Jobs are excellent!
Burst
I don’t know how I can give any more praise to the Burst team. You are beyond excellent! Pretty much everything I have asked for that I believed to be reasonable over the last few years has been implemented. We have compiler hints and assumptions. We have a Burst Inspector that can be searched and can copy-and-paste snippets to the clipboard. And even things I didn’t ask for such as native plugin support and shared statics work flawlessly. And performance just keeps getting better too! A job that used to not be auto-vectorized is now auto-vectorized. And that particular job uses the mod of a double type incremented every iteration to index array elements, with potentially repeat indices or gaps between indices, and with the indices clamped to ranges. That is crazy difficult to get right, and yet Burst achieved over a 2X speedup versus the scalar implementation!
But aside from that, you have been absolutely amazingly responsive on the forums!
Right now, I think that floating point determinism is likely the big next item for you. But if you need something else, something I would love to be able to do is highlight a block of Burst Inspector code that I want to be faster, click a button, and get detailed info about specific pieces of knowledge Burst is missing to make even more aggressive optimizations. Then, I could give it proper hints and assumptions to make it perform better. This doesn’t need to be a fast always-on process. I’m only going to do this for the extremely hot code paths and can afford to wait a couple of minutes for this type of analysis.
Anyways, Burst is beyond excellent.
Entities Baking
I’ve already have discussed many of my thoughts on this with a lot more detail with you via other channels. If you want my full notes, PM me and I will send you a copy.
Thank you for fixing UnityObjectRef by the way!
In general, I like bakers a lot compared to conversion. It is much easier to declare complex dependencies correctly. And having a shared world via Baking Systems with [TemporaryBakingType] and [BakingType] is a much more powerful and intuitive experience.
Blob assets are still a problem for most users, though I think I have them solved pretty well with my framework.
Baking systems are still really tough to get right. There’s a lot of ways to mess up and they remain mostly undocumented.
A common but difficult use case is when an authoring component contains a list of GameObjects, and we want to add a component to each entity associated with those GameObjects. It is extremely difficult to do this correctly incrementally, and I would appreciate some additional mechanism to facilitate this type of operation.
However, the biggest frustration is that it is currently impossible to enable or disable individual bakers in an assembly. This is also something I solved in my framework, but I shouldn’t have had to.
Oh, and there’s handling of hybrid objects, but that’s a can of worms I know you are well aware of. I’m still awaiting your solutions and/or workarounds. And I hope that until then, any marketing that suggests “Hybrid is a feature” is a bunch of BS. Personally, I try to avoid hybrid things as much as possible, so it hasn’t been much of an issue for me. But I’m often stumped when asked about it by others.
Right now, I consider Baking “very good, but incomplete”. But if you include the solutions in my framework, then I am ready to call it excellent. While it took me a while to rewrite my animation system to use bakers, I found I was able to more precisely define my optimal data layout despite very chaotic input permutations I have to cope with. Before, I was still trying to figure out what a character customization workflow might look like. But with the rewrite, it kinda just works.
Last time I wrote one of these, I was really concerned about subscenes. They work now for me. And the issues I have brought up have been taken seriously. And while some of the issues still aren’t solved, I have faith and I greatly appreciate the effort!
Entities Runtime
In Entities 0.51, I had a lot of issues getting IJobEntity to work correctly. In 1.0 experimental, I found myself able to use it for simple gameplay code and some baking systems, but not so much for engine code. In the prerelease, it looks like IJobEntityChunkBeginEnd is going to be my new favorite, as it covers the main use cases I had been using IJobChunk for.
The prerelease fixed a lot of annoyances by adding much needed APIs to SystemAPI. Right now, besides IJobEntity support, the biggest gap is the lack of a SystemAPI.GetLookup().
Shared components are so much better! I absolutely love unmanaged shared components. Everything is working in Burst now, and my major bottlenecks have mostly disappeared.
I still want ChunkDynamicBuffers. But more than that, I want chunk components in ComponentTypeSet to work with EntityManager’s NativeArray Add/Remove functionality. I get exceptions when the array of entities gets large enough to hit an asserting code path. I do appreciate the new EntityQuery builder APIs supporting chunk components.
Speaking of EntityQuery builder API, it would be nice to have a fluent API for EntityQueryOptions instead of ORing verbosely-named flags together.
I still need to properly play with enabled components with the new prerelease. The APIs were lacking in the experimental release, but I think that is fixed in the prerelease. The behavior of WithAll/WithNone got me annoyed when I found out about that, but now looking at the docs, I realize that IgnoreComponentEnabledState makes it consider archetypes rather than states, which is simple enough to cover my use cases. Though I think allowing to ignore of enabled state of any ReadWrite component in the query would cover the trickier use cases completely.
IAspect is something I tried and quickly ran into issues with, because documentation and discoverability of auto-generated methods is awful. I think an ArchetypeChunk.Has overload and maybe an ArchetypeChunk.DidChange overload would help a lot combined with the new IJobEntityChunkBeginEnd.
System creation order is really problematic. Right now, the constraints only work if they are part of the same ComponentSystemGroup. Now normally, creation order doesn’t matter for me. But I have been running into issues trying to make a system dynamically add itself to a ComponentSystemGroup during OnCreate based on some configuration data. The problem is the ComponentSystemGroup hasn’t performed its system list allocations in OnCreate yet. Those system lists should really be lazy-initialized.
Then there’s singletons. Several years ago, this post happened in response to me proposing a “Collection Components” solution. Now today, for some reason this “collection component” behavior was implemented specifically for singletons. Also, every singleton has to have an EntityQuery associated with it, which used to be a lot more problematic, but I’m less concerned with systems running always by default (a decision I really like by the way). But why are these three concepts tied together? I don’t like them being tied together, so I implemented my own solutions in my framework, which handle the authoring workflow better too.
With that said, I originally planned to make a demo project showcasing a failure case I believed to exist based on the documentation and code. But I was surprised. By some insane combination of edge cases between the job safety system and the component type tracking, you made these water-tight. I’m impressed! There are still some potential performance ramifications, but those are rarer use cases in practice. So the only thing I don’t like with the design is the “only one” hard rule, since that by itself isn’t actually that useful when you truly analyze the problems singletons get applied to in a DoD paradigm.
I would really like to parallelize batch EntityManager structural change commands. MemCpys do get expensive on a single thread sometimes.
And lastly, there’s bootstraps. I would really like a way to switch between bootstraps so that sample scenes can ship with their own and make them easy to set without errors. I’ve suggested a ScriptableObject profile paradigm similar to SRPs, where that paradigm could also apply to bakers control and even the Editor World.
Overall, I think the runtime feels like a prerelease, very close to excellent!
Scenes
I don’t have a lot to say about this. The tooling is great, and the APIs mostly seem to be there. I had an issue where the Editor World would get into a bad state, and I was able to restart it and restore subscenes into view. I also was able to force subscenes to load synchronously.
However, these things were not straightforward. The API is all over the place. Sometimes it requires static methods, and sometimes it requires adding or setting components with entities. Sometimes to operate on the whole subscene, you need to iterate all of the scene section entities. And sometimes you don’t. It is confusing.
Blob Asset retention is also confusing. I wish blob assets were always retained a full frame after subscene unload so that cleanup systems can function properly. But I haven’t figured out how to get that to work.
It is definitely not excellent, but it is not unusable.
Transforms
The very first day 1.0 experimental dropped, I made a thread detailing some major concerns I had with the new system. None of the concerns were addressed in the prerelease. The change version race condition is still there. There’s still a synchronization pitfall with no tools to force synchronization when working with multiple transforms in a hierarchy. And PostTransformScale is an absolute mess.
From a matrix multiplication perspective, there is nothing “post” about PostTransformScale. It is just a local-space scale that is ignored by TransformAspect and can also be optionally ignored by children. It does not achieve the ParentScaleInverse behavior that allows children to stay attached to surface points of the parent. That means that for cinematic quality animations, users are stuck with alembic mesh caches. And for stretchy object chains, TransformAspect is unusable.
I do like the move away from a matrix-based transform system. But this new version is way too limiting, and is causing a lot of problems. I am already planning on writing my own, which is frustrating given I provide animation and audio solutions that some people are starting to depend on in the absence of official Unity solutions.
It is hard for me to call Unity Transforms excellent when I currently believe GameObject Transforms are a better system right now, even despite the fact that they don’t support DCC parity either.
But probably the most frustrating part of all of this, is that the changes needed to support my use case will have an very low impact on performance and next to zero behavior differences. All of the really nice properties of Transforms V2 are preserved, with the benefit of also having non-uniform scaling support for the common use cases. If you want the exact math and modifications required, just ask.
This is now my #1 concern.
Graphics
The BatchRendererGroup regression used to be my #1 concern. But that issue is fully resolved.
I like Entities Graphics a lot! The workflow and integration with ShaderGraph and custom material properties is incredible. The feature support keeps getting better. It absolutely scales.
There’s nothing that screams at me “who in their right mind would do that?” Is there potential to do things even better? Absolutely. I personally have a modified version that makes culling a full ComponentSystemGroup. In Entities 0.51, that had a downside of the overhead of systems. But in 1.0 with ISystems, that overhead is gone. I also don’t upload material properties for chunks that are fully culled, which is a massive optimization.
Now skinned mesh rendering is a little rough. I can tell that you do not have a scalable animation system, because otherwise you wouldn’t have removed the Linear Blend Skinning codepath. That codepath is significantly faster on the GPU, and there’s nothing you can do to make compute skinning beat it because the performance difference is exclusively the extra memory operations to store the vertex in the compute shader and load it in the vertex shader. Sure, linear blend skinning in the vertex shader might not support all the features, but it is awesome for less-detailed LODs. There’s other possible improvements you could make with skinned mesh rendering, but I’ve already implemented many of them in my framework, so you can just look at that if you need ideas.
A common complaint is the lack of any mechanism to create rendering entities from custom meshes and materials computed from a custom Baker. Once again, this is something I solved in my framework.
But despite all of these improvements I’ve been able to make, I believe this is exclusively because I have a fresh perspective. You have done an incredible job laying the ground work. BRG is awesome.
Now onto the actual issues I am stumped with.
First, I’d like to figure out how to get skin matrices into uniform buffers for mobile GPUs. Adreno GPUs which power Meta headsets are especially annoying because they only have 8k of uniform memory. If you have any ideas, I’d love to hear them.
Could we please get a version of GraphicsBuffer.UnlockBufferAfterWrite() that works with jobs, so that the buffer can be unmapped when BatchRendererGroup completes the culling callback JobHandle? Right now, SRP evaluation is when all my worker threads are sitting idle, because draw call generation is cheap. I would like to use those threads more, and this change would help with that.
Overall, I would still say that Entities Graphics is excellent. It has taught me a lot about how to get the most out of Unity’s ECS. And even though there is still room to take it further (like what I’ve done), I don’t want that to detract from the amazing accomplishment this package is. And the way you handled the BRG regression on the forums was professional, clear, and informative.
Physics
Several years ago, I made the comment that I think it was going to take you a while to fix issues and support common use cases that people will expect for a 1.0. Well, it took you a while, but over the last year, I’m seeing it come together and start to resemble that more ideal state. Scaling was a big one. But I have also seen you clean up the simulation pipeline and hooks.
Now for anything other than rigid body physics and static world queries, I still think it will have trouble scaling with the many tradeoffs that were made. However, that’s not the reason I don’t use Unity Physics. The reason is because I need something that is going to integrate much more tightly with animation and AI systems. It doesn’t make sense for Unity Physics to pursue my use case.
My main criticisms come from “surprise behaviors”. An example is that raycasts ignore convex radius while collider casts don’t, meaning that a raycast and a spherecast against a corner of a very large cube will result in hit positions that differ by several units.
Another thing you might want to look into is reducing simulation latency for NetCode. I’ve seen a few too many profiler captures where very simple scenes are just too expensive.
But overall, for rigid body physics simulations and static queries, and only those two things, I would consider Unity Physics as excellent. However, if you want it to stretch into visual effects or heavy trigger-driven gameplay mechanics at scale, then the tradeoffs you made in the design fall apart. I am very curious what your plan for active ragdolls is.
NetCode
Every time I see someone do something with NetCode, I see them tackle it with a trial-and-error mindset, putting out fires, and constantly assuming that “if it works, it is correct”. That is the total opposite to how I reason about code. I want to know if something is correct in principle. I want to reason about a path to a solution. And while there will always be a little bit of trial-and-error, I try to keep the number of iterations to a minimal.
I have read through the NetCode documentation and all the learning resources multiple times. I understand what all the different tools are trying to do at the high level. But as soon as I try to reason about anything I want to create, I look at the example and think, “This shows I can do X, but why is X allowed? Would Y be a legal alternative? What if I want to do Z?”
It is frustrating, because at this point I feel like the only way I’m ever going to get past this point is to make massive wall-of-text posts on the NetCode subforums and have someone very technical give responses. Is that what I should do?
Most of my questions revolve around how to bound timing-related issues when coordinating specific types of gameplay events.
My instinct right now is that if I get past this block and make things “click”, I’m probably going to like NetCode a lot. Much of what is there seems heavily customizable and flexible, which I appreciate.
I can’t say if it is excellent or not, since I am simply unable to reason about it. But it feels close.
Audio
The only reason this is still on this list is because it is still available and still working in 2022.2, and for that, thank you! I use it for my high-level audio solution.
I still have no idea when AudioKernel allocations are valid. Should I even bother with them if my graph is mostly static? And I don’t fully know the rules regarding empty node ports. So if anyone does know and wants to drop in some tidbits, I’d appreciate it!
I would also love NativeArray-based APIs for reading and writing AudioClip assets.
But besides those asks, I actually really like what is there. Performance is good. I can’t say it is excellent, because it isn’t really supported. But it has the potential for excellence in the right hands.
Final Thoughts
Overall, I was very nervous about 1.0 experimental. The main-thread performance with unmanaged containers was very promising, but making my framework compatible was one of the toughest upgrades I have faced with DOTS. With that said, the prerelease solved a lot of my concerns. It is enough that I can start recommending 1.0 prerelease over 0.51. If the next prerelease is as big of a jump as was between the experimental and the prerelease, then I think the official 1.0 full release will be awesome!
As always, I want to thank the DOTS developers who are active on these forums for being active on these forums. I know I can be tough to listen to, since I build tech that seems to discredit what you are building. But I’m still sticking with DOTS since I started with it four years ago. And I only want to see DOTS continue to get better! You guys are always welcome to reach out to me, publicly or privately.
And lastly, thanks for making it to the bottom of this wall of text!