Last time when I needed general byte array compression I used LZF, but now I needed something heavier. So this time I went for LZMA. I took code from latest LZMA SDK and it worked fine in Unity. I just made simple static class (LZMAtools) that can be used to call out right parts of the SDK.
LZMA SDK is placed in the public domain and same applies to the static class I made. I did remove few files (CommandLineParser.cs, LzmaAlone.cs and LzmaBench.cs) from the SDK since those aren’t needed in general Unity usage but otherwise all files are same.
Code downloads:
Bitbucket (Bitbucket)
1586984–130888–lzma_v101.unitypackage (21.5 KB)
And test code
void Start ()
{
// Convert 10000 character string to byte array.
byte[] text1 = Encoding.ASCII.GetBytes(new string ('X', 10000));
byte[] compressed = LZMAtools.CompressByteArrayToLZMAByteArray(text1);
byte[] text2 = LZMAtools.DecompressLZMAByteArrayToByteArray(compressed);
string longstring = "defined input is deluciously delicious.14 And here and Nora called The reversal from ground from here and executed with touch the country road, Nora made of, reliance on, can’t publish the goals of grandeur, said to his book and encouraging an envelope, and enable entry into the chryssial shimmering of hers, so God of information in her hands Spiros sits down the sign of winter? —It’s kind of Spice Christ. It is one hundred birds circle above the text: They did we said. 69 percent dead. Sissy Cogan’s shadow. —Are you x then sings.) I’m 96 percent dead humanoid figure,";
byte[] text3 = Encoding.ASCII.GetBytes(longstring);
byte[] compressed2 = LZMAtools.CompressByteArrayToLZMAByteArray(text3);
byte[] text4 = LZMAtools.DecompressLZMAByteArrayToByteArray(compressed2);
Debug.Log ("text1 size: " + text1.Length);
Debug.Log ("compressed size:" + compressed.Length);
Debug.Log ("text2 size: " + text2.Length);
Debug.Log ("are equal: " + ByteArraysEqual (text1, text2));
Debug.Log ("text3 size: " + text3.Length);
Debug.Log ("compressed2 size:" + compressed2.Length);
Debug.Log ("text4 size: " + text4.Length);
Debug.Log ("are equal: " + ByteArraysEqual (text3, text4));
}
public bool ByteArraysEqual (byte[] b1, byte[] b2)
{
if (b1 == b2)
return true;
if (b1 == null || b2 == null)
return false;
if (b1.Length != b2.Length)
return false;
for (int i=0; i < b1.Length; i++)
{
if (b1[i] != b2[i])
return false;
}
return true;
}
Output is:
text1 size: 10000
compressed size:57
text2 size: 10000
are equal: True
text3 size: 574
compressed2 size:414
text4 size: 574
are equal: True
LZMA does compress better than LZF, and it does very efficient XML compression (520 KB → 12 KB).
This code DOES NOT decompress .7z files! That would require additional code. Also logic is for single file → single file.
A file created with LZMA encode can be decompressed with 7-zip, Keka etc, but original filenames are lost since the compressed file doesn’t contain any additional metadata. e.g. if you create myfile.lzma from importantant.txt with CompressFileToLZMAFile, and you extract the myfile.lzma with Keka, you get myfile
EDIT:
Memory usage goes up if you use large dictionary size. Below is an image that shows memory usage in certain scenarios
Basically decompression takes memory ~dictionary size + 55KB and compression ~11.65 * dictionary size.
Default dictionary size is 4MB (which isn’t good for mobile devices if you are doing compression since memory usage goes to 46MB), but you can choose different dictionary size with function calls that have LZMADictionarySize dictSize as last parameter, e.g
LZMAtools.CompressByteArrayToLZMAFile(lenaImage.bytes, "output.lzma", LZMAtools.LZMADictionarySize.Dict1MiB);
You can also use custom dictionary sizes by manually setting them (create new enum with chosen size in bytes as value). One shouldn’t choose dictionary size that is larger than the size of the input file / byte array since it doesn’t increase compression effiency.