Unit test With DOTS project

So I’m a bit of a fanatic when it come to unit test and in my previous non DOTS project I just used test runner. Curious how people run unit test in dots or what they are using.

thanks in advance.

1 Like

Test Runner is really good with DOTS. The base class can just init default world and add all systems like runtime, then all you do is just setting up entities then World.Update.

You can think you are testing in “unit” though you are not running just the Update of that system, the system you want to test is among other systems, which should not be running given entities present in the test. It make the test extra tight and reveal more problems.

Also crucial but I have to do it myself, are assertion methods based on entities. EntityManager.CreateEntityQuery is the way to go since it is system independent (CompleteAllJobs manually before it just to be sure). But it is quite verbose to CEQ → eq.GetComponentData … etc. to check for existence, or its value (easy if GetSingleton is possible, which you should make your test so that it is possible to use.) or how many.

I wrote assertion utility so I could do it in 1 line e.g. : Assert.That(eqUtility.EntityCount<Tag1,Tag2,Scd1>(scdFilter1), Is.EqualTo(4))

Do u have an example on ur git anywhere

The base class for test : https://github.com/5argon/EcsGadgets/blob/master/ECSTestBase.cs

The EntityManagerUtility I used for assertion : https://github.com/5argon/EcsGadgets/blob/master/EntityManagerUtility.gen.cs …it is a massive 7000 lines code gen covering 0~6 IComponentData and 0~2 ISharedComponentData (with/without using it as a filter)

Awesome thank you

@5argon Why are you adding all the systems in your ECS test base class? By my opinion ECS test should test (and use) just one system, not more.

It is easy, set up the test entities, run the system and compare the entities with the expected output. Unit testing in ECS is much easier than in OOP. Unfortunately it is the only thing that is easier.

At first I also did it like that because it sounds like a “unit”. But because there are bugs related to system ordering, I don’t want to write almost exact same test again except the only difference is when the system I want to check are positioned among other systems. One other bug is related an input entity is not the same at runtime because some prior system took it and forgot to restore back the state to be ready for a particular system.

It is also a lot of work to cherry pick related systems to be in my test world (usually I would forget a new EntityCommandBufferSystem I just made, etc.), I can no longer liberally add a ton of systems without making the test all go red, I have to remember to also add them to the test. The most fun I had when creating systems is when I can declare new class on top and use IDE to separate to its own file, and it just works (thanks to expensive assembly scan, etc. and careful update group/require for updates, etc. I think the cost was all for enabling something like this)

Having a test that care about which 1 system to test rather than just a single world adds too much friction. Refactoring one system to multiple (in a way that it take input from the first one like a conveyor belt) also break the test that focus on a single system previously, since just that system can no longer complete its original objective but now just a subset of it. If I write a world test and if the test is still green, it indicates instantly that my system refactor is valid.

One period I thought about unit test in ComponentSystemGroup unit, so I could avoid hassle of collecting systems to test in the test. Then I realize why not just use the default world.

The solution is therefore use default world and develop in UPM package so assembly scan is more limited. Since then I violate about the unit of test as per system, move it up higher to a/a sequence of world update instead. Name the test functionally rather than referring to a system’s name. It still give similar result than a single system test but with less maintenance cost + more coverage.

My approach is easier because you set test entities, but call update on the world instead of the system, then compare. I think there is no downside to this other than you don’t get to see the system’s name in a test case (?)

But this isn’t unit testing. Unit testing is about testing the smallest testable unit and in the case of ECS this is usually the system. With your approach you can hardly get deterministic results or easily see the problems in code.

Yes, by definition it isn’t unit testing but I found it useful. (so off topic in the first place? I don’t know what would you prefer as a unit) Maybe I would call it integration testing, or X testing. I don’t care much about terms, but it give me enough confidence to not write some unit tests since it function the same and more.

I think the real unit is the input entity and its data. “If the input is like this, after a round of world update, which the world contains systems I am interested it (though I am not saying it explicitly in the test), no matter what how other systems are present, it should become this.” ← World update helps satisfy the “no matter what” part

Of course if I have a lot of time I would write per-system unit tests and this kind of test on top, but I am afraid the cases that would be produced would be too similar. e.g. if some per-system test is red, it is also likely that the world test that is looking for the same thing is also red. I don’t want to edit 2 similar tests in a similar way every time something regressed.

There are still unit tests that test on structs e.g. IComponentData that could CompareTo, but I no longer (want to) test on each systems when developing a UPM package of systems.

Yes, the downside is that you don’t know exactly which system cause the problem because they are all in every tests. Though, the red test is usually (more than 50% in my game) related to undisposed container, or duplicated component, set while not added yet, having bad things in bursted function, query is empty while you expect it to have something, use tag component where it requires data, singleton returns 0 or more than 1, want to write but passed read only, IL gen error in 0.3 that was fixed in 0.4. In all those cases, the error would point me to the correct system in the pile of update round. I found that it still preserve system pinpoint you would get in unit test very well while giving me a lot more benefits.

The only troublesome case where your per-system case is better as far as I see when doing this practice, is when the system is not updating at all due to mismatched archetype. Because there is no error until I look closely to each system’s update criteria. I realize this weakness, but still the coverage benefits worth it. (Plus, if the test name is good enough I still know which systems to look at. Maybe not a single, but usually not more than 3-4 systems to achieve the objective of the test)

You mean the test changes its result as you add more seemingly unrelated systems? In my opinion I rather thankful when that happens, that adding other system alert me that a regression occured. It is the point that I do it like this.

Maybe it is even a bad determinism when some design you made elsewhere would cause trouble to the system you are testing, but the tests are still green. For example when you introduced a system that clean up tags but you forgot to add UpdateBefore to a system that need to look at the tag. I would like the test about the system that use the tag to go red, this test need to update the world instead of just a system to go red in this case.

You could argue but unit test made sure the system works alone perfectly and the green test is assuring that, but most of the time systems are never made to work alone. Unless it is a system with [DisableAutoCreation], designed to be thrown in a processing world, then be manually tick an Update once, then grab the result in a Conversion System fasion. Then I agree unit testing on this system is the right unit. When thinking like this I feel that unit test on system partly has wasted work for the time I put in because it covered the case that wouldn’t be possible for the system to be in (being alone).

Personally I find debugging individual systems relatively easy. Non disposed arrays, and duplicated components are easy to find, without any unit testing. Most of time debugger shows where it is. Or at least about.

Then we can test methods individually.

The challenge I found however, is when running multiple systems. Adding multithreading, and suddenly some order of jobs execution changes, with tons of entities. Then when jobs aren’t scheduled accordingly, results may vary, by far.

I don’t think simple unit testing can detect such cases, rather than test at normal runtime.

NOTHING could cause trouble to the system I am testing. For instance my system expects two components X and Y as an input and produces component Position2D as an output. I will write the test that verifies such a behaviour and I am done. There is nothing that can cause troubles to this system. Whether the input is correct or not that is not the concern of the system.

I really try to imagine yout testing workflow, but it is really hard. For example when I am starting to write a new game, I can start with any system I wish. Or even better, I can start with the test for such a system. Because I exactly know what the input and the output (of a single system) should be, so I can easily write assertions.

But how can you write assertions for the whole world at the beginning of the developement? You can’t, no human can. So you waste your time on refining the assertions with every system added.

You can probably write some “world” tests, but I don’t take it as an replacement for unit testing.

It is hard to eat soup with the knife, but that’s not knife’s fault :slight_smile:

It goes like this. For example I would like to start a project that simulates cars on the road. At the beginning of the development, I would first start with 1 system that look for entities with Car and Direction, then add to its Position according to it and the Time elapsed of the past frame.

After finished writing “CarMovementSystem”, I would write a test named CarAlwaysMoveForwardIfDirectionPresent. The test name suggests that some system among all in the world could move the car forward.

In this test, first few lines create some entities with Car, different Direction, and empty Position. Add all systems to this world, then call .Update on the world once. (Additionally with system that modify time throw in the world too, so the test could expect where the car would ended up to.) Then I assert each entity’s position.

This is why I can assert the whole world at the beginning (there is only 1 system right now) of the development. Did I misunderstand something?

*Edit : Right, and this is why I moved from per-system way in the first place. I want the previous tests to go red because the system that was tested no longer works correctly among others. I don’t want the tests to stay green because the system on its own still works perfectly fine.

The real issue is that the input may passed through prior systems before arriving at the one you want to test and therefore throw off your test, so you don’t want to do it this way. But I think how previously placed systems may mangle with the input is the concern of the system. After all those thing leads to correctly typing out your GetEntityQuery, you are already expecting some prior processing that make the query activate while designing this one system. If some prior systems interferes with the test (where it would be green if tested in isolation), I have caught a bug on wrong GetEntityQuery which is inside the system I am testing. Unless it is a system with [DisableAutoCreation] I view it as it can’t be direct-feed an input entity like in the test. (Depending on purpose, but usually [DisableAutoCreation] indicates a more manual use which per-system test would perfectly capture the case.)

An awareness of what should come before or after is in the design of each individual system in the form of [UpdateBefore] [UpdateAfter] on the declaration. And world test would be able to test it. Therefore if you view system as a unit, and this unit contains a definition in relationship to others, to test this unit throughly the test may end up not looking like a unit test in definition because it must contains others.

Inevitably in world test approach, the input seems like input to the world rather than the system. You would have to setup as if you aren’t aiming for that system specifically (but the test name will say what you are trying to confirm) and it is the difficulty of testing this way. Per-system test would have easier time because you know an input go directly into the system. Best use what you are comfortable with.

But in the past I found that while that made it easier to write a test, input going straight to system is missing too much of what Unity ECS lib was structured on. (The lingering systems auto populated and ordered on world, dependent on built-in systems such as time or rendering, etc.) I ended up having to cover it up with similar but broader tests for almost all systems again to make sure the system code is really 100% usable. And those doubled tests are too much to maintain, the broader test usually go red in a useful way. The per-system one only go red when I did want to completely remove something and it only reminds me to clean up test that depends on them. It rarely go red to alert me of wrong behaviour.

And now you will add the “TurboSystem”, that will cause that the car’s position at the end of the frame will be different and your test will fail. So you will try to adjust the assertions to fix it. Maybe it sounds easy with two or three systems, but with hundreds of them?

I think you mean like this, if the 2nd system is to be “CarDecelerationSystem” which adds a modification that the car in close proximity to other car, the car behind will have a limited movement that in the next frame it should not ended up overlapping the next car.

In this case, my previous test may be red if the starting entities was too close and it take the effect of this new system, and I wasn’t expecting this proximity rule to be present. Then, I must edit the test.

If I do it in your way, adding this 2nd system could cause no trouble at all. Is this the problem you are referring to? So the problem is not that no human cannot assert the whole world early in development, but rather the world is changing as you develop and you fear that it will wreck previous tests.

The problem is that this : “CarAlwaysMoveForwardIfDirectionPresent” is not true anymore (literally, since it must now slow down or stop in some case). It is correct that I have to adjust the assertion. But is it bad? I mean, I don’t write the test with objective for it to stay green forever. I would rename this test to “CarAlwaysMoveForwardGivenEnoughDistancesBetweenThem” then this tests turns to account 2 systems at the same time, though still has an objective of testing the first movement system, the only modification is that the test entities now have large distance between. Then, I would still write similar tests for the CarDecelerationSystem where some of them are close together but still call a single update to the world then see if those did slow down. This test would be named “CarDeceleratesToAvoidCrash”. If a 3rd system that make the car jump over other cars when closely behind and make the CarDeceleratesToAvoidCrash red since the position ended up in the air, then so be it. Since literally the car no longer decelerates to avoid crash but jumps over. Keeping a green test that verify just the deceleration system then is not so useful anymore.

The test was wrecked because there is a regression.

In per-system unit test, if the test would be named a bit too literally (but true to the environment of the test) it would be : “CarMoveForwardAfterJustCarMovementSystemUpdated” and stays green even if the 2nd system added. Though, I would rather have the first one where it goes red and I must change the test case, because this test being green does not give me much confidence.

The test was not wrecked because it was written too tight (i.e. small(est) unit of test) and anyways requiring more broad test which they should be red covering it before they could be used in production.

*added: Yeah this is what I meant, and wanted. I have to fix it since the test is not true anymore. It no longer asserting what the program wants to do, the program do not want to just move cars mindlessly now that more systems are present. The previous test may as well stay green because they don’t have TurboActive on entity setup, on a flipside, if they are red because they lacked TurboInactive and goes way further than expected, then that’s the more good reason for the red. I very much want hundreds of them to be red if that should happen.

But if there will be let’s say 10 systems that affect position, how can you make an assertion? How will you calculate the expected position for the assert? Will you calculate it in your head? Like ok, so the starting position is 0,0, speed is 80, breaks are on, turbo is off, there is another car ahead, the terraing is a sand, the car is Ferrari, so the position at the end of the frame should be 10, 2. And will you repeat that for the case where turbo is on? And for the case where turbo is on and starting position is negative? And for the case where turbo is on and the terrain is grass?

I will tell you what you will probably do, you will not calculate the target position at all, you will just take the actual result from your failing test and replace the expected result with it. So if there will be a bug in your code, you will propagate it to the expected result.

Even when doing it like this you simply can not cover all the possible combinations of the components affecting the target position.

Good point, the real problem is then it is hard to expect the result. I currently use this solution : finding a neutral input.

I don’t want to let go of the whole world coverage advantage, what I did is to setup the most simpler case that could still (indirecty) pinpoint a desired system among all other systems. For example, to test the turbo system, everything else should be off and the terrain should be normal, no other cars, etc. It just have the turbo. The tire should have all modifiers to 1, whatever helps everything to be neutral. I am still targeting to see the behaviour of a single system in this test, but it is not a unit test. (This neutrality was done in [SetUp] that affects just tests that look at one system)

What about turbo while an another car is ahead? Then it is no longer a test of just this system. A per-system test (test just the turbo) and world test that target a single system (setup so everything are neutral except the turbo) would similarly not covering this case anyways. It requires a proper integration test. So, writing all permutations is out of our debate between per-system test and world test that look at a single system. It need to be done separately.

An another example from my case, it is a music game. In all my tests except the test that see how BPM affects things, the BPM is always 120 because it is an easy number divisible by 60 to write expected result.

The remaining question maybe is it simply impossible in large games? (won’t scale?) I could only guess it would be possible by correctly limiting the starting entity in a midst of massive number of systems, or by developing more UPM packages that test in their own assembly to preserve the world testing advantage. My game is not large enough to confirm (but it works so far). Per-system way could be better. (than multi-UPM of world test?) I would let you know when my approach fails, but I am afraid my game would never get to that point.

This is a fault of inexperienced coder, so you simply must improve and not get lazy to fall to that and propagate the bug. You should look at the assert and think is that really so? Most of the time it will arrive at the same result as asserted, and then it is alright to type in like the assertion. The simpler setup I mentioned must help so you can calculate a precise expected result.

But the real issue you are trying to convey is that, this way of testing increases too much difficulty and promotes laziness in calculating correct answer, that the only way is to just type in the answer so the test is green (useless green). This is turning into a scaling problem again and depends on project. I can’t answer because I could still calculate the answer normally so far in my project.

If it is still too hard and really no way out, then I think you have a very good reason to perform per-system test as usual. I would too if it is getting that hard and diminishing the world coverage I would get in return. There is no need at all to stick to just one methodology. (though I would first try to make it possible to test with the world by default, the same way some program design is testable or not testable)

By finding neutral point of some parameters like I said it is possible to reduce permutations you must do. For example just 2 cases of turbo on/off while all others are predictable in order to test turbo system among all others you aren’t interested iin. You don’t need to vary all other to test the turbo system, that would be a case on those all other systems where turbo instead locked to a neutral off. You then get similar coverage pattern to per-system test, it is surely not covering everything. But it is covering more than individually testing systems at the expense of setting up neutrals.

Anyways, it is not that I would have to always do world test even though I prefer it. If the problem remaining is just the difficulty of predicting result, I can still fall back to test a single system too to test a unit of development rather than testing the actual game behaviour. But so far the setup for each test are still not overwhelming (yet).

Any argument with “not possible” is usually a scaling problem. And again, my game may never get to the point where it became overwhelming that I could come back and tell you so I guess it ultimately comes down to this :

  • You believe the per-system test scales infinitely since the unit is a system. You scale the game with more systems, so this approach is scalable since tests aren’t exploded by the presence of other systems, just the new system added.
  • I believe as the game scales with more systems, they still work on limited components on entity so the world test should stay managable. When it did explode, I would create more UPM packages to counter it. Inter-UPM test would be added later.

https://www.youtube.com/watch?v=k8ws_APXilE

Sorry, I couldn’t help myself :slight_smile:

Ok 5argon, I really don’t see any advantages in your testing workflow, but maybe it fits to the project you are working on, I don’t know.

I am sorry that my explanation is not obvious that there are some advantages. (among disadvantages, but there is at least 1 advantage) For example : It helps test the system where it was positioned before/after other systems because the test break when more systems are added. This is one of them and is the appeal of doing it. Difficulty of predicting result is one of disadvantages.

It still fits the project right now. Not sure about in the future. (scaling problem)