Clip instruction not vectorized on OpenGL (ES)

I was having some problems porting a shader from the HLSL/DirectX side of things to OpenGL ES. Since the effect was a full screen effect, I was focussed on problems with the inverted y coordinate in screen space. (UNITY_UV_STARTS_AT_TOP)

It seems however that the real issue was in the fact that the clip instruction can’t be vectorized on OpenGL ES. This might also be the case for OpenGL, but I’m not sure yet.

What I’m doing is essentially to display a scaled version of a RenderTexture in a full screen pass. Where the uv coordinates go below 0 or above 1 I clip the output. Which works fine like this on DirectX:

float2 uv = constructYourUV();
clip(uv);
clip(1.0 - uv);
return tex2D(map, uv);

On OpenGL ES this code only clips in the X direction, because clip is not vectorized. There are no compiler warnings about implicit reductions in vector width. This code works fine on OpenGL ES:

float2 uv = constructYourUV();
clip(uv.x);
clip(uv.y);
clip(1.0 - uv.x);
clip(1.0 - uv.y);
return tex2D(map, uv);

Is this a known limitation of the OpenGL variation of the clip instruction? I’m using #pragma glsl by the way.

Edit: It seems GLSL has a non-conditional discard instruction, so this HLSL code:

clip(x);

Is probably converted into this GLSL:

if (x < 0.0) discard;

Which indeed is not vectorized.

You usually can’t vectorize conditions, because the result is ambiguous. I’m quite surprised it works with clip, but according to the docs it should translate to something like: if (any(uv < 0.0)) discard;
So this is most likely a bug of the hlsl → glsl compiler and you should report it.

By the way, it should be a bit faster to merge the two clips into a single discard like this:

if (any(uv != saturate(uv)) discard;

Well, an if condition can’t be vectorized, but I know from HLSL experience that clip can be, that’s why the issue wasn’t immediately clear to me. You can actually vectorize a condition like this:

float2 x = float2(0.0, 1.0);
float2 y = x + (x > 0.5 ? 1.0 : 0.0);

That should lead to y being 0.0, 2.0. The big question is whether an approach like this leads to any vectorization speed up.

It does seem like a bug in the hlsl → glsl compiler, that is why I’m reporting it here.

I like your use of saturate to combine below 0 and above 1 situations. That should indeed be a bit faster.