Does vertex layout (eg splitting to multiple streams) matter for GPU performance?

For example, is there a good reason to put the position/normal/tangent in one stream, the bone weights/indices in another, and the UVs in a third? Is there an optimization there if the vertex shader simply transfers some of the data (eg UVs) directly the the output to be interpolated, while others like the position needs to be actually transformed/processed by the vertex program?

1 Like

Yep it should. It’s the same as organizing memory layout in DOD in SIMD friendly manner (in Unitty case how components packed in chunk). And that’s actually recommended approach to split your streams better. For example what recommended for tile based GPU (typically android one)


The similar suggestions you can see in GPU Gems for DX, splitting data to streams reduces memory usage for different parts and types of rendering

3 Likes

IIRC we tried this at some point, and our tests didn’t show a performance benefit that was worth it.

1 Like

In general case - probably, as IMO splitting data to streams really depends on complexity and filling of the rendering frame and it can give you really good benefits if you optimize streams specifically for your case (and especially on mobile platforms), isn’t it?

I think its more to do with reading or writing the info. Editing part of the data repeatedly together can go in 1 stream, therefore is in sequence in memory. Although most speed would come from same types or from smaller types in size.

For example on CPU side, accessing x and y alternatively per cycle;

for (int idx = 0; idx < 10000; i++)
{ x = someData [idx]; y = someData2 [idx]; }

Compared to 2 streams on x and y which should be faster on CPU since it isn’t jumping x to y per cycle.

for (int idx = 0; idx < 10000; i++)
{ x = someData [idx]; }

for (int idx = 0; idx < 10000; i++)
{ y = someData2 [idx]; }

Think it’s a similar concept.

We were testing this on mobile, and we tried a workload that was really heavy on vertex processing.

2 Likes

I think the problem is that vertex workloads are usually such a small fraction of the pixel work that the impact of stream splitting is negligible.

Thanks all! Looks like it’s not significant enough a difference to complicate my dynamic mesh generation code, but I’ll keep it in mind.