I have a relatively large struct which includes 12 enum fields. Sometimes I pass thousands of these in an array to a Burst function.
The enums only have a small number of values, so I figured instead of using int as the underlying type, I could use an sbyte to reduce the overall size and fit more of them into the cache. When I made this change, my stress test lost FPS, dropping from ~250 FPS down to ~220. What could be causing the performance drop here? Are non-int enums marshalled back to int again or something?
I think CPU’s lack instructions to process data smaller than 32 bits. In my experience, assembly code that uses data smaller than 32 bits tends to copy it to a 32 bit space before doing things with it. So that could explain the performance hit.
EDIT
Have you tried combining some of these enums into a single enum with bit flags? Maybe that could be faster with the magic that Burst does?
This depends on the struct layout and access patterns. Could be related to data alignment. Data aligned to 8-byte boundaries can be faster to access.
It could be a revealing exercise to compare the Burst output for the two enum types. (It’s convenient to diff the two outputs using Rider/vscode/[your favorite diff tool]).
So it’s definitely just a data packing issue and not something Burst is doing? I still find it really surprising, the smaller struct is probably half the size overall so it should be able to process twice the number of items per cache line loaded, even if there is some wasted padding. I might have to run some experiments just to confirm this.
Hard to say, but what Burst does is influenced by the data layout. I recommend comparing the output in the Burst inspector.
How are you accessing the data? Are you reading all of the values in the struct, skipping some, or accessing them randomly? Have you considered trying a SoA layout?
It does seem like it should be faster because of cpu cache optimization, especially if everything else is equal. Maybe it’s stopping Burst from vectorizing some array access?
I think you could get a good idea of what’s happening here if you looked at the asm in the Burst inspector. If there aren’t many differences between both cases, I think it could be the way CPUs handle 8 bit data. There’s a lot of info out there about modern cpus being slower at doing operations with data smaller than 32 bits. We’re talking about a .5ms difference here, so a small amount of extra cpu clocks per call could be enough if your stress test is big enough.
You need to work with the power of 2. Your struct is essentially 12 bytes. You can add an int on the end for padding, you can also explicitly declare the size of the struct too, with field offsets. Change the size to 16 bytes, and see if that makes a differences, assuming you are passing a NativeArray. If you do UnsafeUtility.SizeOf() and it is 12 rather than 16, theres part of the issue.