Suggestions for "safer" way to write data to a file?

I had a user run into a computer crash, coincidentally at the same moment that my game was writing some data to disk. After restarting their system, the file was heavily corrupted, being filled with nothing but empty characters. Making matters worse, Steam sync then uploaded this junk file, overwriting their old cloud save, totally wiping out their progress, with no way to recover it. Obviously a terribly thing.

I thought my approach to file saving was pretty solid. Here’s what I’m doing currently:

// Write the data to a temp file, not the existing file.
File.WriteAllText(tempFileName, dataAsJson);
// Now copy that completed temp file onto the existing file.
File.Copy(tempFileName, fileInfo.FullName, true);
// Now delete the temp file.
File.Delete(tempFileName);

The idea here was that if anything went wrong with File.WriteAllText, the content was being written to a temp file, not to their existing save file. Only after a successful WriteAllText was I copying it to the existing file path. But that is apparently insufficient, and it seems that the crash occurring during File.Copy resulted in a corrupted file.

Anyone have any approaches to a more robust file save?

I’m planning on trying something fairly complex, which I’d happily avoid doing if someone knows of a simpler way to make my existing approach bullet proof.

Anyway, my proposed new approach will be to make a backup of the existing file before I try to save new data to it. After I perform the File.Copy, I’ll then read the file again, and make sure it has the correct content. As long as it does, I’ll delete the backup file. But if something goes wrong, the backup file will remain. Upon starting up the game, I’ll check whether any of those backup files exist, and prompt the user to recover them.

Or maybe that’ll be vulnerable to the same kind of problems?

Eventually all things fail. If I’m playing a game that crashes while it was saving I chalk that up as bad luck. I know it’s done. You can try to stop it, but you can’t. Keep the backup, store it on cloud. It gives a bit of redundancy so that not all is lost when the main save is lost.

Just thinking out loud, but what if you kept around 2 save files at all times? When loading, load from the one with the more recent timestamp. When saving, replace the one with the older timestamp. (And then immediately verify the save, if you want to be paranoid.) If there’s ever a problem loading the more recent file, you can use the older as a backup. You never delete a save until you’ve verified that there’s a valid, more-recent save to replace it.

(If your save files aren’t too big, you might actually want to generalize this from 2 saves to N saves, and add a feature where the player can revert to any of their N last saves whenever they want.)

1 Like

I agree use some kind of encryption code that is the last thing written in the file. The very last thing written is a random 16 digit code that must be present before the operation continues. It can be the last thing written, and the first thing read. It not only must be present but must contain valid characters. But all in all, if I shut down windows when it’s in the middle of saving something you can expect corruptions. So unless it happen all the time it’s really not an issue.

Yeah, I’m thinking that having a pair of files probably covers all reasonable failure modes. The files themselves are pretty tiny (a few KB), so space isn’t a concern. Probably I just need to confirm, after saving to a file, that everything saved correctly, before I redundantly save to the backup file. That way, there’s always at least one pristine file containing either the very latest data, or at least the data from the previous save attempt.

I’d just hate to have a bunch of people lose data if their power went out, if it’s simple enough to prevent. But as you said, some basic system of verifying that the file saved correctly feels right to me.

Using a temp file / second file is the right approach, however the importent part is how you handle the loading. Because if a corruption has occurred you have to actually load that temp file instead. In general in order to protect yourself against corruption (intended or unintended) it’s best to include a checksum / hash of the actual data so you can verify the data integrity. If the loading itself already fails or the checkum / hash check fails doesn’t really matter. You just know you can not use / trust this data. So you would load the last backup (i.e. temp file).

The usual approach is not really to copy the temp file over the old one, but simply copy the old save to a “.bak” file and write the new file. If the new file is corrupted and can not be loaded, you simply load the .bak file and recover the actual save file from there.

Just looking at the saving part won’t cut it. The loading part is the important part ^^.

3 Likes

I would second the keeping-N-files approach Anti and Bunny mention above.

Just make sure you have a handle on when the above sync to Steam takes place, otherwise you could end up chasing the problem around.

Having a checksum or CRC or sha1 hash is also a great idea… but always be sure to test it by making a “save corrupted savegame file” function so you can reason about all the edge cases involved.

You probably even want a “save corrupted savegame and crash hard without shutdown code” so that you simulate the original condition and have confidence there’s no shutdown code helping you out.

2 Likes

Thanks, guys. I’ll go with a .bak file approach, and verify the file saved properly by loading it immediately after saving it to confirm it saved properly. And @Kurt-Dekker 's advice on catching all the edge cases is well received.

2 Likes

One thing I tend to do is to NOT use COPY but instead use MOVE for replacing the original with the tmp file. It’s not a 100% fix but reduces the risk of corrupted files a little more. So here is what I’d do in your setup:

  • Write save into tmp file
  • Validate tmp file
  • (optional) Move original to backup
  • MOVE tmp file to replace original
  • Upload to server

Notice even if the “slow” write fails it won’t corrupt the original and there is no “slow” copy. It operates on the assumption that a failed move is less bad and less likely and that these WRITE, MOVE operations actually are done in sequence by the code and the backend (harddisk,…).

If creating the bak which is a great idea by the way, just create the bak when the game is loaded, so the player may lose a little progress. Creating the bak of the previous save in the same place that you create the file of the new save opens the bak to corruption issues. When we know that the player is uncorrupted is when they first opened their save and spent time playing.

Once you had verified that the new save file is uncorrupted, and enter the game state, now is the time to create a bak of the state you just loaded. If you can verify that the save file was corrupted on load, you can revert to a previous bak if it exists.

Oh i see is exactly what geo said. It’s the perfect method in my opinion.

I’m curious why File.Move would be more reliable, and less prone to data corruption, than File.Copy. The downside to File.Move, though, is it can’t overwrite an existing file. Which means that immediately before performing the File.Move, I first need to delete the real file. That makes me a bit nervous (maybe for no good reason), but it makes me reluctant unless there’s some solid reason why File.Move is less likely to cause problems than File.Copy.

Although not universal, Move often maps to the operating system’s rename function. In Posix systems this is just an inode update, so I imagine it could potentially even be atomic.

Copy on the other hand probably has to assume the worst and do an open for write (overwrite), the copy the data from source to dest, then flush the data and close the data.

2 Likes

Make saves double buffered. If A is OK the game loads from A but will next write to B. This means you are only doing one file-write operation each time.

The version number inside the file should be incremented each save. If loading fails the integrity check it will load the other file. The next save will remedy any issues by itself, so you do not need to save two files each time, minimal footprint.

Obviously it means you read two files each time instead of writing two files each time, so I’m not really sure it’s a game changer by any means.

1 Like

Exactly that. Breaking that is harder than messing up a file write/copy where a bunch of data is moved or duplicated.

It should if you are in a multi threaded context. If not, then that should be perfectly fine. Also, don’t delete the original. Just move it to a backup (I tend to just add “.backup” to the filename) so you can restore if anything happens.

I know the concept of revertable changes as being called transactions or transactional. There even is a transactional file API in c#, though I have never used it. I only know the concept from Databases.

That may only look like a downside at first glance. Assuming that a copy is much, much slower than a file move your file may be in a “corrupt” state for much longer. Imagine it being half written. I’d rather have the filesystem report the file as not existing between the two moves than reading a half written file. - Admittedly this depends a lot on how you write to the file (lock) and how it’s implemented in the current runtime. I don’t know how Unity does it but in general I start by assuming what Kurt described. Thus move would be preferable.

Having multiple or alternating save files is a good idea. But even then I would use the move strategy instead of copy.

This has become a pretty interesting thread. Lots of good suggestions here. Now my main concern is that in my efforts to improve the robustness and integrity of my save/load process, I’ll make a highly complex system that ends up being more prone to failure. :slight_smile: But some basic testing should help convince me it’s a change for the better. Thanks for all the good ideas.

1 Like