LZMA compression and decompression for Unity

Last time when I needed general byte array compression I used LZF, but now I needed something heavier. So this time I went for LZMA. I took code from latest LZMA SDK and it worked fine in Unity. I just made simple static class (LZMAtools) that can be used to call out right parts of the SDK.

LZMA SDK is placed in the public domain and same applies to the static class I made. I did remove few files (CommandLineParser.cs, LzmaAlone.cs and LzmaBench.cs) from the SDK since those aren’t needed in general Unity usage but otherwise all files are same.

Code downloads:
Bitbucket (Bitbucket)
1586984–130888–lzma_v101.unitypackage (21.5 KB)

And test code

void Start ()
    {
        // Convert 10000 character string to byte array.
        byte[] text1 = Encoding.ASCII.GetBytes(new string ('X', 10000));
        byte[] compressed = LZMAtools.CompressByteArrayToLZMAByteArray(text1);
        byte[] text2 = LZMAtools.DecompressLZMAByteArrayToByteArray(compressed); 

        string longstring = "defined input is deluciously delicious.14 And here and Nora called The reversal from ground from here and executed with touch the country road, Nora made of, reliance on, can’t publish the goals of grandeur, said to his book and encouraging an envelope, and enable entry into the chryssial shimmering of hers, so God of information in her hands Spiros sits down the sign of winter? —It’s kind of Spice Christ. It is one hundred birds circle above the text: They did we said. 69 percent dead. Sissy Cogan’s shadow. —Are you x then sings.) I’m 96 percent dead humanoid figure,";
        byte[] text3 = Encoding.ASCII.GetBytes(longstring);
        byte[] compressed2 = LZMAtools.CompressByteArrayToLZMAByteArray(text3);
        byte[] text4 = LZMAtools.DecompressLZMAByteArrayToByteArray(compressed2);

        Debug.Log ("text1 size: " + text1.Length);
        Debug.Log ("compressed size:" + compressed.Length);
        Debug.Log ("text2 size: " + text2.Length);
        Debug.Log ("are equal: " + ByteArraysEqual (text1, text2));


        Debug.Log ("text3 size: " + text3.Length);
        Debug.Log ("compressed2 size:" + compressed2.Length);
        Debug.Log ("text4 size: " + text4.Length);
        Debug.Log ("are equal: " + ByteArraysEqual (text3, text4));
    }

    public bool ByteArraysEqual (byte[] b1, byte[] b2)
    {
        if (b1 == b2)
            return true;
        if (b1 == null || b2 == null)
            return false;
        if (b1.Length != b2.Length)
            return false;
        for (int i=0; i < b1.Length; i++)
        {
            if (b1[i] != b2[i])
                return false;
        }

        return true;
    }

Output is:
text1 size: 10000
compressed size:57
text2 size: 10000
are equal: True
text3 size: 574
compressed2 size:414
text4 size: 574
are equal: True

LZMA does compress better than LZF, and it does very efficient XML compression (520 KB → 12 KB).

This code DOES NOT decompress .7z files! That would require additional code. Also logic is for single file → single file.

A file created with LZMA encode can be decompressed with 7-zip, Keka etc, but original filenames are lost since the compressed file doesn’t contain any additional metadata. e.g. if you create myfile.lzma from importantant.txt with CompressFileToLZMAFile, and you extract the myfile.lzma with Keka, you get myfile
EDIT:
Memory usage goes up if you use large dictionary size. Below is an image that shows memory usage in certain scenarios


Basically decompression takes memory ~dictionary size + 55KB and compression ~11.65 * dictionary size.

Default dictionary size is 4MB (which isn’t good for mobile devices if you are doing compression since memory usage goes to 46MB), but you can choose different dictionary size with function calls that have LZMADictionarySize dictSize as last parameter, e.g

LZMAtools.CompressByteArrayToLZMAFile(lenaImage.bytes, "output.lzma", LZMAtools.LZMADictionarySize.Dict1MiB);

You can also use custom dictionary sizes by manually setting them (create new enum with chosen size in bytes as value). One shouldn’t choose dictionary size that is larger than the size of the input file / byte array since it doesn’t increase compression effiency.

iOS LZMA cost much memory,why?
LZMA在iOS平台上压缩占用内存好大,有办法解决吗?

Hi there. Sorry to bump this thread again.

Can anyone help me with retrieving this (compressed) data back from mysql? I’m storing it fine, but get stuck when it comes to returning the data back to Unity.

My data is stored in BLOB format on MySQL DB. Can I just echo that data back as a string and decompress in unity? I’ve been trying this but get “OutOfMemoryException: Out of memory” errors.

Do I have to handle this data differently when it comes back down via www.text ?

Thanks for the helper script and for the tip, Agent_007. This was easy to integrate and made a huge difference when sending bytes over the net. Very pleased!

Sorry about late answers. It seems this thread wasn’t on my follow list.

The memory usage should be about same on all platform. But the default dictionary size for LZMA in C# version is const int kDefaultDictionaryLogSize = 22; which means 2^22 bytes (4 megabytes). What dictionary size means for memory usage
for compression: (dictSize * 11.5 + 6 MB) + state_size
for decompression: dictSize + state_size
state_size = (4 + (1.5 << (lc + lp))) KB by default (lc=3, lp=0), state_size = 16 KB.
http://sourceforge.net/p/sevenzip/discussion/45797/thread/524d4695/

so you might want to decrease the dictionary size if you are doing encoding with devices that don’t have that much RAM. I will add dictionary size option to next version which should arrive during next week. And I will also create table about memory usage.

If you want to handle binary data via Unity’s www class, then use www.bytes, But if you must use www.text for some (bizarre) reason then do base64 encoding for data in server and base64 decoding in client
http://stackoverflow.com/questions/11743160/how-do-i-encode-and-decode-a-base64-string