I have a simulation running in a custom render texture of format ARGBFloat (32 bits per float).
I would like to store more than 4 data in that texture and I noticed that my simulation was running fine if I reduced the texture format to ARGBHalf (16 bits per float) so I thought I could keep the ARGBFloat (32 bits) format but store two half (16 bits) in each channel.
Basically, I would like to pack two half (or 16 bits float) in a single channel of the texture in order to store more than one information per channel.
I tried the following functions
float PackFloats(float a, float b) {
//Packing
uint aScaled = a * 65535.0f;
uint bScaled = b * 65535.0f;
uint abPacked = (aScaled << 16) | (bScaled & 0xFFFF);
return asfloat(abPacked);
}
void UnpackFloat(float input, out float a, out float b) {
//Unpacking
uint uintInput = asuint(input);
a = (uintInput >> 16) / 65535.0f;
b = (uintInput & 0xFFFF) / 65535.0f;
}
But I seem to lose too much precision in the uint conversion.
So I was wondering if there was a way to not do any conversion and simply store bit by bit two half in a float. If the float is a 32 bits array, is it possible to simply copy an half as a 16 bits array in the first 16 bits of the float and another one in the last 16 bits ? Then read the float as a 32 bits array and get the two halfs back ?
The problem is there’s not really a clean, easy way to convert between 16 bit floats and 32 bit floats. It’s not that a 16 bit float is only using the first 16 bits of a 32 bit float, or that you can make a 16 bit float from a 32 bit float by skipping the last 16 bits. In fact, the values between 0.5 to 1.0 in a 32 bit float can change 24 bits, all 23 bits of mantissa, and 1 bit of the 8 used for the exponent.
Functionally a floating point number has 3 parts, the sign, exponent, and mantissa. The sign is self explanatory, but the exponent and mantissa can be confusing at first. But really the exponent is just giving a starting number using 2 to the power of some exponent, and the mantissa is the linear fraction between that starting number and that number * 2.
The problem is that means to convert between a 16 bit float and 32 bit float requires more than just masking values. It’s possible, and hardware that support both has the functionality to do so, but GPUs aren’t guaranteed to have that hardware.
On the other hand the difference between a 16 bit uint and 32 bit uint is just using the first 16 bits of the value, at least for the range of values that they both share. And modern GPUs are guaranteed to support uint to float conversions. Hence why a lot of packing examples you’ll see for shaders convert to uint.
Now the thing you can do to get better precision would be to apply some additional math to the data before converting to the uint, and vice versa. For example you could try uint aScaled sqrt(saturate(a)) * 65535.0; and a = pow((uintInput >> 16) / 65535.0, 2.0); which is relatively inexpensive but will give you better precision for the smaller values.
Another alternative would be: uint aScaled log2(saturate(a) + 1.0) * 65535.0;
and: a = pow(2.0, (uintInput >> 16) / 65535.0) - 1.0;
I believe that’ll more closely match the precision of an actual floating point value.
I understand that you cannot just truncate a 32 bit float to get a 16 bit float.
My idea was more to use the 32 bit float just as a 32 bit array to store two 16 bit floats. I believe this is doable in C# (like this for example) and I was wondering if it would be possible to do something similar in hlsl :
Create a 32 bit array F.
Convert 16 bit floats A and B to 16 bit arrays.
Concatenate A and B and store them in F.
Store F in a 32 bit float texture value.
And then unpack by doing the opposite.
Basically the 32 bit texture value would only be used as a storage array and never really interpreted as a float but I did not find anything on how to read or write bytes directly in a shader so I was wondering if it was possible.
I will test with your higher precision conversion to see if I get better results.
You can totally do that. The hard part is the conversion between 32 bit and 16 bit float still. Most GPUs obviously support various 16 bit texture formats, and can do the conversions from the texture format to a 32 bit float the shader sees. But they don’t always have support for doing any conversions in the shader, hence the problem.
There are the f32tof16 and f16tof32 functions you might want to try, though I’ve not tried them myself, they should do exactly what you want. Though I’ve seen some complaints they don’t work on some GPUs the way they should.
float PackFloats(float a, float b) {
//Packing
uint a16 = f32tof16(a);
uint b16 = f32tof16(b);
uint abPacked = (a16 << 16) | b16;
return asfloat(abPacked);
}
void UnpackFloat(float input, out float a, out float b) {
//Unpacking
uint uintInput = asuint(input);
a = f16tof32(uintInput >> 16);
b = f16tof32(uintInput);
}
To ensure you’re getting the packed data correctly, you’ll probably want to make sure the render texture is set to point filtering, and/or use _MyTex.Load(uv * _MyTex_TexelSize.zw) instead of _MyTex.Sample(sampler, uv) or tex2D(_MyTex, uv). The later of which if you’re using, you’ll also need to make sure you’re using Texure2D _MyTex; instead of sampler2D _MyTex; to define the texture uniform in the shader to use Load(). Just be mindful that the Load() function uses integer pixel location and not a float UV, hence my little example code there multiplying by the texture resolution.
Thanks a lot ! The f32tof16and f16tof32do exactly what I need indeed and work perfectly in my case !
I am currently using _MyTex.Sample(sampler, uv) with point filtering but it seems like Load() should do the same thing but faster since I do the bilinear filtering myself on the unpacked data.