Treating Bytes[] as just Bits

Hi,

So working on my compression algorithm involving jobs and Burst. One function that has given me brain fuzz is how to treat the bytes as bits fluently at the lowest level. Although will need to rethink when working with more bits, crossing over multiple bytes, and storing longer sizes than uint (for match/replace).

Say for a simple example I have some memory of size 2 bytes / 16 bits.

00101101 10011010
      ^^ ^
Offset 7, number of bits 3

GetBits  x00 x00 x00 00000011
SetBits  x00 x00 x00 00000101

00101110 10011010
      ^^ ^

Besides my multiple job algorithms, I asked GPT4 to assist with this part. While it will take me some time to digest all that is happening here with bit operations (I’m sure Burst will optimise it nicely), was hoping some bit wizard could look over it to see if it is correct. The challenge is spilling from one byte to the next (out of bounds already handled in the job).

private static unsafe uint GetBits(byte* data, int bitOffset, int numBits)
{
    // Determine which byte contains the first bit of interest
    int byteIndex = bitOffset / 8;
    
    // Determine the position of the first bit of interest within the byte
    int startBit = bitOffset % 8;
    
    // Start by extracting the bits from the first byte
    uint result = data[byteIndex] >> startBit;
    
    // Check if the bits of interest span across the current byte into the next byte
    if (startBit + numBits > 8)
    {
        // If so, also extract bits from the next byte
        // Shift the bits from the next byte to align with the bits from the first byte
        result |= data[byteIndex + 1] << (8 - startBit);
    }
    
    // Mask out any extra bits we don't need by applying a bitmask
    return result & ((1 << numBits) - 1);
}
private static unsafe void SetBits(byte* data, int bitOffset, int numBits, uint value)
{
    // Determine which byte contains the first bit to be set
    int byteIndex = bitOffset / 8;
    
    // Determine the position of the first bit to be set within the byte
    int startBit = bitOffset % 8;
    
    // Create a bitmask to clear the bits we want to replace in the first byte
    int mask = (1 << numBits) - 1;
    
    // Clear the relevant bits in the first byte, then set them with the corresponding bits from value
    data[byteIndex] = (byte)((data[byteIndex] & ~(mask << startBit)) | ((value & mask) << startBit));

    // Check if the bits to be set span across the current byte into the next byte
    if (startBit + numBits > 8)
    {
        // If so, handle the bits that overflow into the next byte
        data[byteIndex + 1] = (byte)((data[byteIndex + 1] & ~(mask >> (8 - startBit))) | (value >> (8 - startBit)));
    }
}

In case you don’t know: there are BitField32/64 and NativeBitArray.

1 Like

I’m aware but that doesn’t cover crossing boundaries. Also thought it better to unroll into the exact requirement as better optimisation happens then. Might work better again if I treat the bytes as uint/4 bytes, although I’d have to factor a partial uint on the final bytes < 4.

1 Like

Why is that a requirement? You didn’t say what this compression algorithm is for, perhaps it’s very specialized. Or perhaps you are just wasting time not having tried Gzip or RLE (or others) first to see if they match your use case. Both are readily available, and at least the latter can be bursted or may even have a bursted implementation. :wink:

Perhaps this BitStream implementation, although almost 20 years old, may provide some clues. Or this one, only 9 years old. Interesting that this doesn’t come up often as a thing in searches. Curious: Unity even had a BitStream back in v5 or so.

I’d rather go with NativeStream. It’s just a matter of filling up bytes with your bits, and for decompressing you have a marker how many remaining bits need to be considered in the last byte. Not sure if that fits your algorithm though.

I’m always on some radical misadventure, so you could say specialized hehe! I’m working on my own idea of making bits more patternable/more 0s with minimal cost in <4 dimensions on the bits, together with <4 dimension matching algorithms. I might fail, but the fun is in trying :slight_smile: