Clarification on ComputeBuffer.SetData layout requirements?

Hi, I’m looking for a little clarification on the general case layout requirements when passing structs to ComputeBuffer.SetData with reliable cross platform support. Looking around online, in some cases I see people doing it without specifying LayoutKind at all, and in some cases I see people simply summing the size of individual fields without any attention paid to things like alignment and padding.

What are the actual requirements here?

Does a struct with LayoutKind.Sequential reliably match the same struct definition in HLSL on all platforms?

Hi!

No.

If you need cross-platform compatibility, don’t use 3-component vectors. Vectors should be sorted by the number of components, from highest to lowest. If you have an array, use 4-component vectors inside.
The safest option is to always use 4-component vectors everywhere.
All in addition to sequential layout, ofc.

3 Likes

I think for constant buffers you need to use what is called std140 layout in OpenGL. That’s a cross-platform layout as far as I know.
You can use 3 component vectors but you always have to align them on 16 byte boundaries and pad them with a 4th component.
So you could do something like this:

float3 MyVec3;
float Padding;

float3 AnotherVec3;
int MyInt;

float2 MyVec2;
float2 AnotherVec2;

float2 YetAnotherVec2;
int AnotherInt;
float AnotherFloat;

That’s how it would look on the CPU side. On the GPU, if you have two 3 component vectors next to each other, the padding is implicit.

    When using the "std140" storage layout, structures will be laid out in
    buffer storage with its members stored in monotonically increasing order
    based on their location in the declaration. A structure and each
    structure member have a base offset and a base alignment, from which an
    aligned offset is computed by rounding the base offset up to a multiple of
    the base alignment. The base offset of the first member of a structure is
    taken from the aligned offset of the structure itself. The base offset of
    all other structure members is derived by taking the offset of the last
    basic machine unit consumed by the previous member and adding one. Each
    structure member is stored in memory at its aligned offset. The members
    of a top-level uniform block are laid out in buffer storage by treating
    the uniform block as a structure with a base offset of zero.

      (1) If the member is a scalar consuming <N> basic machine units, the
          base alignment is <N>.

      (2) If the member is a two- or four-component vector with components
          consuming <N> basic machine units, the base alignment is 2<N> or
          4<N>, respectively.

      (3) If the member is a three-component vector with components consuming
          <N> basic machine units, the base alignment is 4<N>.

      (4) If the member is an array of scalars or vectors, the base alignment
          and array stride are set to match the base alignment of a single
          array element, according to rules (1), (2), and (3), and rounded up
          to the base alignment of a vec4. The array may have padding at the
          end; the base offset of the member following the array is rounded up
          to the next multiple of the base alignment.

      (5) If the member is a column-major matrix with <C> columns and <R>
          rows, the matrix is stored identically to an array of <C> column
          vectors with <R> components each, according to rule (4).

      (6) If the member is an array of <S> column-major matrices with <C>
          columns and <R> rows, the matrix is stored identically to a row of
          <S>*<C> column vectors with <R> components each, according to rule
          (4).

      (7) If the member is a row-major matrix with <C> columns and <R> rows,
          the matrix is stored identically to an array of <R> row vectors
          with <C> components each, according to rule (4).

      (8) If the member is an array of <S> row-major matrices with <C> columns
          and <R> rows, the matrix is stored identically to a row of <S>*<R>
          row vectors with <C> components each, according to rule (4).

      (9) If the member is a structure, the base alignment of the structure is
          <N>, where <N> is the largest base alignment value of any of its
          members, and rounded up to the base alignment of a vec4. The
          individual members of this sub-structure are then assigned offsets
          by applying this set of rules recursively, where the base offset of
          the first member of the sub-structure is equal to the aligned offset
          of the structure. The structure may have padding at the end; the
          base offset of the member following the sub-structure is rounded up
          to the next multiple of the base alignment of the structure.

      (10) If the member is an array of <S> structures, the <S> elements of
           the array are laid out in order, according to rule (9).

Correction: Unfortunately, this isn’t true for Metal, as aleksandrk pointed out below.

1 Like

Thanks for the reply!

The documentation could be a little clearer in this area, and this seems like a prime candidate for a C# analyzer! It would be cool if the compiler could check all types passed to SetData<T> and other similar methods for reasonable layouts. It’s pretty dangerous to have an API like that which implies that any type will work, when the restrictions are actually so heavy.

That’s a great resource @c0d3_m0nk3y , thanks!

On Metal the size of float3 is 16 bytes. So if you add padding the way you described, the layout will be broken. There is a packed float3 type that is 12 bytes, but it’s recent enough to not be supported on all relevant devices.

1 Like

Definitely :slight_smile:
We have something like that in the backlog.

1 Like

If we use 1, 2, and 4 component types (mixing and matching between floats and ints) is that guaranteed to be good?

Cool, I’d love to see it hit the frontlog, as it were. Perhaps if you could provide a technically precise specification of the behaviour (including all of the little details like that Metal thing which I wouldn’t have known from your first reply) the community could build it for now? I’d be happy to make it. With all due respect, when I hear a feature is in the Unity backlog (especially a feature relating to a years old API) I assume it won’t appear anytime soon.

I suppose the main thing we would need here is a comprehensive list of the layouts of all HLSL types, including information about how they may vary across all platforms, including with nested structs.

No, each has to be aligned on this type’s width. That ism if you have a

float2 a;
float b;
float2 c;

you’ll have 4 bytes of padding between b and c, because the start of float2 needs to be a multiple of float2.

Let me check, I believe someone was actually looking into it.

1 Like

As long as they’re arranged largest to smallest that should be good then, right?

Cool! Are analyzers like this the type of thing that would make their way into older versions, or would this be a 2023+ thing? Obviously an analyzer is theoretically an easy drop in for an old version. I’ll probably just build one myself soon actually, it shouldn’t be hard at all. But it would be good for others if it was done by default!

Thanks for pointing that out. Didn’t know that.

1 Like

Yes.

Technically, it’s a feature, and we normally don’t backport those.
Enabling it by default would be great, but it needs to be done in several stages. Otherwise if someone updates to a newer version, they would suddenly start getting errors/warnings out of nowhere.

Looks like it has been deprioritised, unfortunately :frowning:

Ah well, I’d be lying if I said I didn’t expect that outcome. Thanks anyway.