Hello! I am new to unity and I am trying to make a word game.
So I used a text file and I converted it into a list in my code. The issue I had is that whenever I tried to check if the Word exists in the list it says that it doesn’t exist in the list. The only word it says that it exists in the list is the word that is on the end of the list which is “ZZZS”. I tried to do Debug.Log(dictionary[0]) and it shows the first word in the text file but if I put that word in my game, it detects it as not a word. I don’t think its a case sensitive issue because the game I made only uses caps lock letters and my Text File uses caps lock letters as well. I think the problem is that my file has too many words.
Here’s my code:
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using TMPro;
public class WordChecker : MonoBehaviour
{
public TextMeshPro display;
public TextAsset Words;
public string WordTest;
private string theWord;
void Start()
{
display = GameObject.Find("WordDisplay").GetComponent<TextMeshPro>();
}
public void OnMouseDown()
{
theWord = display.text;
var content = Words.text;
var AllWords = content.Split('\n');
var dictionary = new List<string>(AllWords);
if (dictionary.Contains(theWord))
{
Debug.Log("The word exists in the dictionary.");
}
else
{
Debug.Log("The word does not exist in the dictionary.");
}
}
}
If you happen to know a way to improve this code please let me know.
Ah! Friend, you are another victim of mess that arenewline formats. I downloaded your file to make sure. Welcome to the club.
Basically, Windows’ programs by default write new lines as “\r\n” instead of “\n”. That is carriage return before line feed. It’s so silly; they still treat text like a typewriter, where writing a new line meant returning the carriage to the first column then moving the paper to the next row. And it will be like that until the end of civilization, because that’s how tech works. Your file uses windows-style newlines, so there’s a ‘\r’ char in almost all the words (except the last one).
So, what can you do? You could use content.Split("\r\n") instead of content.Split('\n'). Don’t use content.split('\r', '\n') because it will add an empty word for each “\r\n” pair. Don’t use content.Split('\r', '\n', System.StringSplitOptions.RemoveEmptyEntries) because it’s buggy in Mono in these kinds of cases; it will combine a lot of words into one sometimes.
A solution that may be more solid is to process your file(s) to convert them from CRLF to LF; your code would work without any change. There’s software to do that, and some text editors in windows have a setting for it already. That way, if someone with Mac or Linux ever authors a file, it will be compatible with your game.
That said, I think the best you could do is not to use spaces to separate words. That’s a bug waiting to happen. The csv format was made for these kinds of cases, so I’d start with that; it’s just words separated by commas and it can be exported from Excel. Or you could use JSON, it’s universal and it let’s you add metadata to your words if you ever need it.
Well, that’s actually not silly and your interpretation is actually not quite true ^^. The “LF” character is called the “line feed” character (\n). It usually is not supposed to return the carriage to the first column. Yes, a lot other systems use this convention, but they actually interpret the control characters incorrectly.
Btw: HTTP is also still using /r/n as a line delimiter and that’s true for all systems. Pretty much all text line based protocols use /r/n.
Don’t get me wrong, I also do think a single character would make our life a lot easier. IBM mainframes actually used a completely seperate new line character “NL” 0x15. Though none of the other systems have adopted it.So the issue isn’t really the two character delimiter but the fact that each system rolls their own interpretation. The classic MacOS only used a single “/r” which is equally misinterpreted. Since there are essentially 3 different interpretations (and /r/n would be the only “correct” one), those are causing our headache in the first place. In the end it’s just a matter of sticking to a standard. Though none of the major systems are willing to let go of their interpretation. Well MacOS essentially switched to the unix interpretation. However you always have a mix of legacy applications and newer ones. You can not simply decide to switch to a new system “just because”. Responsible companies do not break their standard, ask Linus Torvalds
That was very illuminating, thanks. About this particular quote, the character I meant to say returns the carriage to the first column is “\r”. I understand that there’s a context where CRLF is more correct than LF, I just think it’s a bit funny that this context is related to typewriters.