Saving/Loading with encryption

Okay, so I was bothering quite some time “How to save large amount of data locally and load it later?”

Forums and answers helped me quite a lot and I learned much about serialization and such and as I discovered new methods choices began to appear.

I’m creating large 2D world that need to be saved/loaded and I decided to use text saving because its faster than XML and smaller than XML, and because i don’t want player to edit save files so easily I needed encryption.

I found 3 easy to use but good encryption methods so if someone is looking for something like that, here it is in one place.

Here is the code:

using UnityEngine;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Diagnostics;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;

public class NewBehaviourScript : MonoBehaviour {

    //List to store variables
    public static List<string> mylist = new List<string>();
    //Key for XOR encryption
    int key = 129;
    //Just basic stopwatch so i can measure time
    public Stopwatch time;

    // Use this for initialization
    void Start () {
        time = new Stopwatch();
        time.Start();

        //Generates 10 blocks with x variable 0-10, y = 0-10, b = 0-3, saves them as string separated by , and then add string to list
        for (int i = 0; i < 10; i++)
        {
            string a = "";
            int x = Random.Range(0, 11);
            int y = Random.Range(0, 11);
            int b = Random.Range(0, 4);
            a = x + "," + y + "," + b;
            mylist.Add(a);
        }
        time.Stop();
        print("Time to generate values" + time.Elapsed);
        time.Reset();

        //You can just ignore stopwatch functions
        time.Start();
        //Classic saving of strings
        SaveInt();
        time.Stop();
        print("Int save time:" + time.Elapsed);
        time.Reset();

        time.Start();
        //Saving by XOR encryption
        SaveXOR();
        time.Stop();
        print("XOR save time:" + time.Elapsed);
        time.Reset();

        time.Start();
        //Saving with binary formatter
        SaveBinaryFormatter();
        time.Stop();
        print("BF save time:" + time.Elapsed);
        time.Reset();

        time.Start();
        //Saving with binary formatter but with base 64
        SaveBinaryFormatter64String();
        time.Stop();
        print("64 save time:" + time.Elapsed);
        time.Reset();

    }

    //With StreamWriter saves all strings line by line to text file
    void SaveInt()
    {
        StreamWriter file = new StreamWriter(Application.persistentDataPath + "/int.txt");
        for (int i = 0; i < mylist.Count; i++)
        {
            string a = mylist[i];
            file.WriteLine(a);
            file.Flush();
        }
        file.Close();
    }

    //Takes string, encrypts string and then saves it to file
    void SaveXOR()
    {
        StreamWriter file = new StreamWriter(Application.persistentDataPath + "/XOR.txt");
        for (int i = 0; i < mylist.Count; i++)
        {
            string a = mylist[i];
            file.WriteLine(EncryptDecrypt(a));
            file.Flush();
        }
        file.Close();
    }

    //XOR encryption by key, basiclly it takes ASCII code of character and ^ by key, does that to each character of string
    public string EncryptDecrypt(string textToEncrypt)
    {
        StringBuilder inSb = new StringBuilder(textToEncrypt);
        StringBuilder outSb = new StringBuilder(textToEncrypt.Length);
        char c;
        for (int i = 0; i < textToEncrypt.Length; i++)
        {
            c = inSb[i];
            c = (char)(c ^ key);
            outSb.Append(c);
        }
        return outSb.ToString();
    }

    //Uses binary formatter to save data
    void SaveBinaryFormatter()
    {
        BinaryFormatter bf = new BinaryFormatter();
        FileStream file = File.Create(Application.persistentDataPath + "/bf.txt");
        bf.Serialize(file, mylist);
    }

    //Converts to string base 64 and then saves with binary formatter
    void SaveBinaryFormatter64String()
    {
        BinaryFormatter bf = new BinaryFormatter();
        StreamWriter file = new StreamWriter(Application.persistentDataPath + "/64.txt");
        MemoryStream ms = new MemoryStream();
        bf.Serialize(ms, mylist);
        string a = System.Convert.ToBase64String(ms.ToArray());
        file.WriteLine(a);
        file.Close();
    }

First I saved normal string no encryption and line of data looks like this “7,3,0”

Second is XOR encryption which seems “unreadable” but is quite easy to brake and looks like this “¶´³”

Third is binary formatter which looks something like this " 7,5,2 "(it has some extra chars not recognized by forum)

Last and fourth is binary formatter but with base 64 and looks like this “AQAAAAAAAAAEAQAAAH9TeXN0ZW0uQ29sbGV”
(they are ordered by protection from weakest to strongest)

As you can see first one is really easy to break, just change some numbers and you are done, second and and third are a bit harder and last fourth seems strongest encryption.
Yeah but what is cost of processing power (I’m aiming for android phone so less processing power) and data size?
I made few tests to exactly show how each perform:


Here you can see how much time (in seconds) I needed to save file and how much space it takes with each encryption.

First basic string save is smallest but also 3rd in saving time, but is worst at data protection as it is easiest to edit by any user.

XOR encryption being 3rd by size (1st is smallest) and being slowest of them all sure is not good performance wise but it seems as “advanced encryption” and will drive off any hacker that is not determined to break the encryption (which I already said, is quite easy)

Binary formatting is very good as it is easy to setup in code, it is fastest (slowest at small amounts of variables but also it does not matter at such small time frame) and by size is 2nd after basic string saving method, also as I did’not find many info about security, it seems as it is better protected than XOR.

Base 64 even its saving size is biggest, it also is one of fastest right after BF and has one better protection than BF. But I find it has some trouble at loading because you also need to deserialize and convert back from base 64 as it returns object.

So, now you know more about saving game data to text files and can see visual code reference for easy setup and also see stats, how each method perform.

I also made graphs at bigger scale (5,000,000 needed for my project) to see how they compare at large data:


At higher scale XOR takes by far the longest time and BF needs almost half the time while comparing the size basic string is smallest and base 64 takes biggest space.

Conclusion:
If you store LARGE amount of data best method would be BF because of relative small size and is fastest while giving pretty decent protection but you would also need to compress the file (if some numbers are repeating they can be shrunk like xxxxx can be 5*x), as for smaller data types I would go for XOR because time is not of such essence and file size is not like SUPER BIG but seems very complicated to crack and is VERY easy to save/load.

Good luck at coding and I hope post is not boring or have too much useless stuff as I tried to combine everything at one place for future searches.




7 Likes

I would like to hear what encryption and saving method you use?

Could you add their respecive Loading methods for completeness? Other than that really nice work and much appreciated.

1 Like

Oh sorry :slight_smile:

So, for loading (testing purposes) i used second list so I can compare the two (original and loaded)

In initialization I defined second List<>:

List<string> twolist;

After that when you want to load to List<> I used this function:

twolist = Load();

Now loading functions need to return List<> and here is copy/paste script for all 4 methods.

The basic string method:

List<string> Load()
    {
        string[] lines = File.ReadAllLines(Application.persistentDataPath + "/int.txt");
        List<string> temp = new List<string>();
        for (int i = 0; i < lines.Length; i++)
            temp.Add(lines[i]);
        return temp;
    }

For XOR you first need to decrypt the code and then send to list:

List<string> Load()
    {
        string[] lines = File.ReadAllLines(Application.persistentDataPath + "/XOR.txt");
        List<string> temp = new List<string>();
        for (int i = 0; i < lines.Length; i++)
            temp.Add(EncryptDecrypt(lines[i]));
        return temp;
    }

And Binary Formatter:

List<string> Load()
    {
        BinaryFormatter bf = new BinaryFormatter();
        FileStream file = new FileStream(Application.persistentDataPath + "/bf.txt", FileMode.Open);
        return bf.Deserialize(file) as List<string>;
    }

Finally from base 64:

List<string> Load()
    {
        BinaryFormatter bf = new BinaryFormatter();
        StreamReader file = new StreamReader(Application.persistentDataPath + "/64.txt");
        string a = file.ReadToEnd();
        MemoryStream ms = new MemoryStream(System.Convert.FromBase64String(a));
        return bf.Deserialize(ms) as List<string>;
    }

As you can see, the process is quite the same but goes in opposite direction and is easy to setup and gives a lot of control over save files and how you want to save/load them.

Hope it helps :slight_smile:

Also I want to point out that you need to close the StreamWriter in saving function of first two methods like this:

void SaveInt()
    {
        StreamWriter file = new StreamWriter(Application.persistentDataPath + "/int.txt");
        for (int i = 0; i < mylist.Count; i++)
        {
            string a = mylist[i];
            file.WriteLine(a);
            file.Flush();
        }
        file.Close();    //Add this after for loop
    }

    void SaveXOR()
    {
        StreamWriter file = new StreamWriter(Application.persistentDataPath + "/XOR.txt");
        for (int i = 0; i < mylist.Count; i++)
        {
            string a = mylist[i];
            file.WriteLine(EncryptDecrypt(a));
            file.Flush();
        }
        file.Close();     //Add this after for loop
    }

Because when you want use other stream it will give you error.
I will edit this also in original post.

4 Likes

Also what I wanted to share is loading time:


And table of best in fields (best to worst):

Saving time: BF, Base 64, Int, XOR
Loading time: Int, XOR, BF, Base 64
Size: Int, BF, XOR, Base 64
Security: Base 64, BF, XOR, Int

2 Likes

First of all nice comparison and thx for the effort.

I just wonder why you take XOR and don’t BF. As a player I would prefer fast saving over fast loading…

If

If it is less data, like 1,000 and player wants to “hack” his save file to gain adventage he will try edit save file, which we aim to protect.
When he see XOR structure most would say “What is this?” while BF seems easy to edit. If he edits BF save file is ruined and can’t load it anymore.
As you see BF is better in any aspect but time and size is no big difference here, also it is mostly about developer which method he needs.

1 Like

You do realize that what you’re doing is not encryption, but merely obscuring things? I can just bypass your “encryption” by whipping out Cheat Engine and hack myself a thousand lives or gold or what have you.

Yeah I know, but I’am talking about singleplayer game to keep of casual gamers from editing. This is encryption but simple encryption, and it is enough to secure your game from unwanted visitors.

1 Like

To me it seems you make a string “7,3,0” and then save and load this string. Is this correct?

If that is the case “7” isn’t going to be the binary 0111, (or it’s 32bit eqivalent, assuming int32) but the ASCII representation formatted as a 32bit number. This is very inefficent. The number 12 in this format will use 64bits.
But I can’t get the filesizes to add up; Are you sure that the datasets are comparable; based on your random generator one dataset can use random numbers 9,5,2, while another can use 10,11,2) - that is a huge difference!

Anyway;

To make a binary format work it is common to define headers and sections in the file.
[8bits- datatype] [16bit length] [32x length - data] [the next 8bits datatype] etc, etc.
Or;
If you don’t have a large array of data; your data type can be players, rocks, 2D coords, etc,etc.
[8-bit - datatype (rock tiles)] [16-bit -length] [256 bytes of rock tile data x length]

In your example above the block you created would fit in 4bits+4bits+2bits, so you use 10bits to save it (plus header), instead of 56bits (3x8 for the numbers, 3x8 for the commas , and 8 for “endofline”)
(The binary format filesize doesn’t quite add up 6 chars as int32, so this is where i got confused by the filegenereration)

To encrypt this I don’t encrypt “7” - I encrypt the entire binary file, so there is no size difference (other than some padding) between an encrypted file and a regular file. For savegame files (say 10MB) the preformance impact of encryption is also neglectable - if efficently implemented.

But in this case encryption is shooting a sparrow with a cannon. Have you considered simply adding a checksum at the end? Even a simple algorithm is going to take some effort to figure out.

1 Like

@Arcatus Iam also quite new to this and I think I get what you want say (roughly) but don’t have idea about concept you have in mind if you could explain this little bit more (maybe some example if you have in mind).
Thanks for response

The most important thing when operating on large datasets it to understand the dataset, and use that to your advantage.

So; we can make some back of the envelope calulations;
Say you want to save a large 2D world:

One terrain tile contains the following data:

  • Position; Vector2, 2xfloats = 8Bytes
  • TerrainType; 1 Byte
  • Total: 9 Bytes of raw data.

If you output this to file as a string you would typically write out:

  • two Vector2 as sting will be two (typically) 3 digit numbers; 6 char’s = 12bytes
  • TerrainType as a single digit number= 2 bytes
  • 2 seperators and “endof line” = 6 bytes

So when written as a string the 9 bytes of information is written as 20. For small files (upto some hundred KB) this is often convinient for the programmer and the preformance impact is minimal.

If your world is 1000x1000 you can probably do just fine by writing a stream of 2 floats, 1 byte, 2 floats, 1 byte, 2 floats, 1 byte, etc, etc. and just lay out the world.

a 1000x1000 world written like this is ~9MB. The string version would be ~20MB, but it’s dependent on values. Fractions sits confortably within the 4 bytes of a float, but it can easally be an 8 digit number so the same number as a string is 16bytes long! Be careful when using strings to save (lots of) data! If the position are fractions the filesize would be ~40MB instead of 9.

But even when using a binary dataset we are still very inefficent;

Say the world is 10 000x10 000. With the above binary method we have 900MB of data. (and some memory issues)

We can’t display and manage all these tiles so by deviding the file into sections and data headers we can make a compressed, indexed dataset we can access on demand. There are many ways to do this, and it takes allot of effort to get this efficent.

But for example;
A file header can contain pointers to 100 sections of 1 000 000 tiles.
9 bytes of data in each tile puts us at 9MB sections. (9 bytes is perhaps a bit optimistic, but even at 25 bytes (objects on the tiles, etc) we are at a (bearly) manageble 25MB of data).

But we are not going to save 25MB to the file:
We must compress;

  • By convtering the Vector2 position to a offset in the Section header; we now save 4 bytes each time we need to save the position.
  • By only writing the offset once for consecutive terrains: [u8, datatype=Terrain][u32 length][u32 offset][u8 TerrainType]
  • By only writing the itemtype once; [u8, datatype=item][u32 number of items][u32 offset loc1][u32 offset loc2]…
  • etc, etc

Effectivly I am creating a bit of overhead, almost like a protocol, to effecivly group together similar data. The resulting filesize can be a tiny fraction of the data it represents.
The resulting file is also “impossible” to edit in a text editor. Add or remove a byte and the entire file breaks.

Apologies for the long text and not a single line of code, but managing large datasets is perhaps the best example of PPPPP.
Proper Planning Prevents Poor Performance.

I cover a very broad topic here, so let me know if there are specific things you would like to know more about. I may not know much about Unity, but large datasets is something I work with on a daily basis.

1 Like

@Arcatus So, to summarize what you were saying is that if data is large I need some form of compression of data (because for small amount it is no difference) to reduce size and acces parts of file (with headers and stuff) to reduce time of saving/loading. Also my string method is very inefficient compared binary formatter which is still not efficient because of no compression, did I understand you fully?
Well I noted that for large amount you still need some form of compression but I was just writing about saving methods and their encryptions but knowing compression methods is really usefull that I could include in thread.
Only form of compression I tried so far (and I wasn’t testing it too much) is CLZF compression and I didn’t set it up correctly and writing out algorithm (like you said in your previous post), I don’t have idea how to write one myself.

Thanks :slight_smile:

Yes, Strings is very inefficent as a data carrier.

A binary format is much better, but not because it is compressed. You “simply” need to pick the right datatype for the job. The defualt int datatype in C# is int32. This means that there are always 32 bits of information. C# doesn’t care if the value is 543204 or 1.
When dealing with large datasets this is important. If a tile can have 0-100 trees it should be a int8 (byte) instead. This isn’t compression; it’s just using the right tool for the job.

There are two ways of compressing data:

  1. Write the data file and use compression tools on the file. This can be efficent- the algoritms does an exelent job of compressing a file. I really, really, don’t reccomend you to write your own algoritm. There are lots of open soruce options.
  2. Be clever when witing to file. This requires allot of planning, but if done correcly it is much more efficent.

F.eks;
(I’ll use strings here, as binary data is just impractical to illustrate)
This is your data package:

[posX0, PosY0, id0] … [posXN, PosYN, idN]

The raw data looks like this:
[2,7,5][2,8,5][2,9,12][2,10,2] ~ 30 chars

A compression tool will recognize the [2, and ,5] pattern and compress this down to something like:
X7YX8YX9,12]X,10,2] ~19chars (This isn’t exactly how compression works, but it illustrates a point)

But this isn’t some generic data; it is a dataset we are in complete control of; If we know that posX is static and posY is incremental, we can do something like this:

[StartPosX, StartPosY][id0]…[idN]

[2,7][5][5][12][2] ~18chars

When putting the file back into memory our parser knows how this datapackage works and it adds posX and posY to the ID according to rules we have made.

@Arcatus In example you showed how compression works? I knew that :confused:

About first part that you described unity uses int32 (so they store every variable in combination of 32 bits (1’s and 0’s)).
So if I need to store at third position random number 0-30 i should use int8 because it is in that scope.

But this bothers me a bit, is 4bytes for every character in number (192 = 4+4+4 bytes) or how exactly.
Simply put: what are limits/range of each int(4,8,16,32)? and how to convert my int32 variable that has value of 5 to int4 or int8 for smaller size of stored data and essentially file.

Also binary formatter is better than string saving in any case? So I should use Vector3 instead of string in my example?

Thanks :slight_smile:

I’ve just always used binary it is fast enough and has a nice file size. If the player wants to change the save game and cheat. He is allowed to do so.

You mean binary formatter? This confusses me a little since binary have so many meanings.
Also i would not like that player can cheat EASILY

Yes

private int 32bitvar //this can hold values from approx -2 000 000 000 to 2 000 000 0000
private byte 8bitvar // 0-255

Note that modern microprocessors doesn’t really care if it’s integer variables are 32bit or 8bit, so this is only important if you are saving thousands of datapoints.

yea, that is almost right, but a character is 2 bytes, so the string 192 = 2+2+2 bytes

Converting between types of int is quite straightforward, but be carefull around negative values.

look at https://msdn.microsoft.com/en-us/library/s1ax56ch.aspx for more details.

Yes, serializing the object directly is better than first converting it to a string and then serializing it. In your example with random data from 0-11 a byte array would be a good choice.

Vector3 is constructed from 3 floats; floats are 4 byte data that behave differently from all other datatypes, and it is the only datatype that can hold and calculate decimal values. (although float math is slightly less efficient than integer math)
Conversion of floats are tricky. Even nasa messed that up: https://en.wikipedia.org/wiki/Cluster_(spacecraft)

Okay I was playing a little with your suggestions and change is DRAMATIC!!
Here are stats of saving as string, my way: Save:9,5sec Load:10sec Filesize:55MB (this is for 5 million)

Now i changed List to List and then i add to the list x then y then b (and i know it always goes 3 by 3 so that i know to load correctly) and results are GREAT!
Save:0.0893 sec Load: 0.0565 sec That is like 10 sec gain :slight_smile: BUT file size is much bigger, 65MB

I was not using int8 and stuff because i have no idea how to use but will find.
Also is there something in between like int16? int8 is good for block type but not for x,y which can go up to 10k
And yes, i found only lzf compression which is for strings so if you know some for list of ints it would be great.
thanks for the help :slight_smile:

Cool. :slight_smile: perhaps you could post a zip with the project? Would be fun to run the test here.
I did some quick tests here (but not in unity so it’s not directly comparable) and I am getting numbers around 0.2 seconds to write a file with 15 million I32’s. 15million I8’s takes me ~25milliseconds to write.
What sort of HDD have you got?

The larger file-size is probably because single digit numbers are 2 bytes as a string, but 4 bytes as int, but evens as the file is larger you skip the conversion step between string and numbers, so that is why it is much, much faster.

Some other datatypes would be:
list (I16; -32,768 to 32,767)
list (I8; -127 to 127)

LZF will work any data; have you seen this thread: LZF compression and decompression for Unity - Unity Engine - Unity Discussions