# Select a color component to use?

Input to my shader is an RGBA texture. In the fragment shader I wish to output ONE of the 4 color components based on a modulo of the UV2.x coordinate (which has been input as pixel/window coordinates). ie a pixel in column 0 outputs Alpha, a pixel in column 1 outputs Red, a pixel in column 2 outputs Green and in column 3 outputs Blue component.

I can do this with a simple modulo like fmod(), but then it seems I have to do a set of nested IF statements to choose like, if the modulo is 0 do this, else if the modulo is this do this, else if the modulo is this do this etc… seems inefficient.

Is there a faster way to do this? Some kind of swizzle indexing of the component based on a number? Converting the rgba components to an array then using an integer index?

Maybe something like this would work :

• Let say X is your result from your modulo (as an integer)
• _source is your source color
• col is the output
``````col.a = (1 - X) * _source.a;
col.r = (1 - abs(X - 1)) * _source.r;
col.g = (1 - abs(X - 2)) * _source.g;
col.b = (1 - abs(X - 3)) * _source.b;
``````

Assuming negative values on col are OK (which I have to say I’m not sure but from some tests I did a while ago it seemed not to be a problem and acts like 0) you should have the results you want.
I think it’s better than if statements but maybe there are better solutions.

That seems like it involves even more operations than before though.

I suppose one thing I could do is split the geometry into thin 1-pixel columns and assign 4 separate shaders, one for each modulo, then each shader could just be written to directly output a given color component without any other branching or modulo testing needed. I think that should probably be faster unless this produces significant overhead from all the extra triangles and separate shaders?

Splitting the geometry up into single pixel slices with 4 shaders, or using if statements on the mod of the pixel coordinate are likely both slower than just doing all of the math all of the time and lerping.

4 shaders means context switching and a lot more polygons.
If statements means conditional branching which is pretty fast on modern hardware, but changing every pixel is literally the worst case performance case for it.

So, unless you’re doing a lot of math that is uniquely different per pixel, just do all the math all the time.

Ok I’ll try it and see how the speed is. Thanks

Is it possible to convert an RGBA float4 into like an array of 4 floats, so that you can index it with an integer?

No need to convert, vectors are already arrays.

``````float4 foo = float4(1.1, 2.2, 3.3, 4.4);
float a = foo.x; // a == 1.1 using .xyzw
float b = foo.g; // b == 2.2 using .rgba
float c = foo.p; // c == 3.3 using .stpq
float d = foo[3] // d == 4.4 using [n]
``````

`xyzw `are the default vector component accessors, intended for positions and generic variables.
`rgba `are the most commonly used alternatives, intended for use with texture or other color values.
`stpq `are the almost completely unknown alternatives to the above, intended for use with uv coordinates. These exist in glsl and cg, but not hlsl. Since most people use hlsl documentation for a stand in for cg people miss these exist unless they came from a glsl background.

In most cases the above options are the best because they allow easy swizzling (`.xxx`, `.zyx`, etc.). Internally they all remap back to `xyzw` for the compiled shader, but they’re nice shorthand for showing the different types of information sources even if they’re really all the same data. Note you cannot mix the different accessor sets in the same swizzle; `.xgp` will fail! You can use the different accessors for assignments or comparisons; `foo.xyz = bar.rgb` or `foo.xyz == foo.rgb`.

`[n]` allows direct access to each value, and is most commonly seen when using explicit arrays (`float foo[4] = {1.1, 2.2, 3.3, 4.4};`) but can also be used with vectors and matrices. You cannot swizzle or access multiple components in-line with them though, so `foo[0][1]` will result in an error for `float foo[4] `or `float4 foo`, that is used for accessing multi-dimensional data types; matrices like `float4x4 foo` or vector arrays like `float4 foo[4]` or nested arrays like `float foo[4][4]`.

Thanks for the help. It turns out that e.g. foo[3] runs quite a bit slower than the way I was doing it with IF statements.

Something like the below code might end up faster than the if statements, especially on some older hardware.

``````fixed4 data = tex2D(_DataTex, UV.xy);
half channel = floor(fmod(UV2.x,4.0));
fixed output = dot(data, saturate(-abs(channel - half4(0.0, 1.0, 2.0, 3.0)) + 1.0));
``````

A little explanation:
The first line is self explanatory, get the texture data.
The second line gets us a value of 0, 1, 2, or 3 by flooring the fmod.
The third line is the magic.

A dot product is a easy way to add a bunch of values together as it’s highly optimized on GPUs.
`foo.x + foo.y + foo.z + foo.w`
is slower than
`dot(foo, float4(1.0,1.0,1.0,1.0))`
A dot product on modern hardware can be done in a single cycle where the adds are all a single cycle each. Even on older hardware the dot product is probably going to be two cycles and not three.

So now the `saturate(-abs(channel - half4(0.0, 1.0, 2.0, 3.0)) + 1.0)` part. This can probably best be explained by a wolfram alpha link.
http://www.wolframalpha.com/input/?i=-abs(x±+(0.0,+1.0,+2.0,+3.0))+++1.0+with+x+=+0+to+4
Basically it’s taking the value and getting them into 0 to 1 ranges (plus some negatives). Because channel is floored the values you get back are actually only zero or one, but adding floor to the wolfram alpha link makes it more difficult to understand. So now a channel value of 0 will result in a half4(1.0, 0.0, 0.0, 0.0) and a channel value of 1 will result in a half4(0.0, 1.0, 0.0, 0.0) etc.

The `saturate` and `abs` are both “free”, so that entire line is just 3 cycles even with the dot product. The second line of just `floor(fmod(UV2.x, 4.0))` might be slower!

Thanks for taking the time to write and explain this. I did a test of this code on my main computer vs my original IF-based code, and somehow the two still run at the same speed. I presume the operations are just getting optimized so much that it’s more or less dependent on texture fetches. Your code does work, btw, which is cool. I might have to try it on a slower computer to see if there’s any difference there. No matter what else I’ve tried, the IF statements somehow end up being the fastest solution already.