parsing csv leads to errors when using the last item of the array as a key in a dictionary

I have a curious situation, I’m parsing a csv file and saving it to an array as a string. When printing the last string in the array, it works. But when trying to use the last string in the array as a key in a dictionary, I get a null error. I then manually type in the string name in the dictionary, it works. Next I add an extra column of dummy data to the csv file, and even though the array specifies exactly which columns it wants and that column of dummy data is not saved or used anywhere, using the last string of the array after adding the dummy data now returns the proper value from the dictionary when using it as a key.

Any idea what is going on here? Technically I have solved my issue by adding dummy data to the csv file, but I have no idea why that fixed it.

var data = Resources.Load<TextAsset>("data/categories").text.Split('\n');
       
        const int firstRow = 1;

        for (var rowIndex = firstRow; rowIndex < data.Length; rowIndex++) {
            var row = data[rowIndex].Split(',');
            CategorySaveDictionary.Add(row[0], new CategorySave());
            UnlockFirstLevel(row[0]);
        }
Dictionary = new Dictionary<string, string[]>();

        var data = Resources.Load<TextAsset>("data/categories").text.Split('\n');
       
        const int firstRow = 1;

        for (var rowIndex = firstRow; rowIndex < data.Length; rowIndex++) {
            var row = data[rowIndex].Split(',');
            var category = new Category(
                row[0],
                new[] {
                    row[1],
                    row[2],
                    row[3],
                    row[4],
                    row[5],
                    row[6],
                    row[7],
                    row[8],
                    row[9],
                    row[10],
                    row[11]
                }
            );

            Dictionary.Add(category.Name, category.Words);
        }

Do you have any reason to think we’d be able to give you an answer without seeing your CSV-parsing code?

2 Likes

Heh, yeah that was dumb. I edited my post.

For clarity:

By “last string in the array”, are you talking about the contents of the last column? Or the first column of the last row? I’m assuming the latter, since that makes the most sense.

Any particular reason you create a Category() object, and then use the fields of that object but otherwise discard it? Why not use row[0] and your string array directly without going through the Category class? Seems like an added place for things to go wrong.

For your array code, you don’t need to hardcode every index:

string key = row[0];
string[] values = new string[row.Length - 1];
for (int i=0; i < values.Length; i++) {
   values[i] = row[i+1];
}
dictionary.Add(key, values);

I’m not sure if either of the last two questions will solve the issue, but it’d be good practice.

2 Likes

Keep in mind informally line-formatted data like this will only work as long as \n is reliably your line separation character. If it is in fact \r or \r\n you will not get what you expect.

You can use a hex dumper such as xxd to see what your actual line endings are, and see if that is part of your issue.

There are other overloads of Split() that allow you to drop empty entries, as well as to split upon a variety of possible tokens.

I would also recommend making something external to Unity to convert your script into JSON so that at least it is structured data and you can have a fighting chance of parsing it at runtime in Unity, regardless of newline convention.

1 Like

Seems like overkill, especially if CSVs are already a part of the workflow. If carriage returns are the issue (and they may be!) then accounting for them should be straightforward:

string[] data = .....text.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None);
1 Like

Of course it has to be considered in the context of maintaining conversion tooling, the rate of actual data update, reliability of future third party CSV tooling, etc.

But at the end of the day, properly-conditioned data is simply easier to reason about.

Source: I have reasoned about data.

1 Like

I went to my CSV script I wrote close to a decade ago, and low and behold I found this comment to myself:

        //I'm getting an extra line at the end of the csv file
        //Check if the last line is less than 3 characters long we delete it, this is an arbitrarily picked number but is likely to work fine
        //as a real CSV line is highly unlikely to have a line with less than 3 characters in length, and an extra empty line should have nothing but 1 or 2 whitespace characters at most

So I guess I was hitting issues with sometimes getting an empty line of just white space characters. I don’t remember if I investigated it further, but I use this script all the time and apparently haven’t run into this problem ever since. What followed that comment was just some code to check the last line for its length and remove the line from the list if less than 2 characters long.

Probably would be better if I rewrote that to actually check for what kind of characters are in the line instead of just using the length, but I probably did it this way cause it was faster or I was being lazy.

1 Like