Mono JIT taking ages to process a script, any way to bypass?

I have a large dataset that I parsed into a C# class which is basically just a dictionary full of hardcoded entries from the dataset. The class is about 3mb. When I access this class then Mono JIT will lock the app and take about 60 seconds to process it.

public static class Container
{
    public static readonly Dictionary<string, MyData> Data = new()
   {

        { /*[... ctor for class with 90 fields... ]*/ },
        { /*[... ctor for class with 90 fields... ]*/ },
        { /*[... ctor for class with 90 fields... ]*/ },
        { /*[... ctor for class with 90 fields... ]*/ },
        // [3000 lines later...]
    }
}

Is there any way to force Mono to handle this AOT? I tried moving to IL2CPP but the editor still runs in Mono so it doesn’t help my editor play mode entry times. I even tried stuffing it all into a .dll but Mono JIT still takes forever.

Would it be better to just parse it all at runtime on the fly from the original csv file? What’s the ideal way to get around this issue?

Thanks

:open_mouth:

Oh my.

YES!!

Write a simple command line tool, or there likely are json converters, so that you can convert the csv to json and then use JsonUtility to read in all that data.

It’s worth noting that reading csv files, contrary to what most think, is far from trivial! There is not a single regex or string.split that will correctly do it for you, it has to be a parser unless you can perfectly trim your data to avoid the many pitfalls (ie double quoted text, text containing delimiters or even newlines, string interpretation depends on culture, etc etc etc).

That means you really don’t want to read CSV at runtime and have it fail there, you want to see it fail on your machine, and then rely on json just working fine.

Alternative: import the CSV into SQLite via DBBrowser. Then use the sqlite-net and its Unity companion package from github.

2 Likes

I already wrote editor tooling to automatically parse the sanitized .csv to into an auto-generated .cs file full of my data classes. It’s entirely automated at this point. Is your suggestion here that I parse it into a json file instead (at editor time), so that I can then later (at runtime) just parse the json into a dictionary which will avoid the Mono JIT impact?

My suggestion is to use code for what it is: code.

And then use code to read the data at runtime. Any way is fine, as long as the data isn’t part of the executable file which requires a recompile even for completely unrelated reasons: a change of data. This makes the workflow rigid and time consuming, even without any increase in compile time.

Compiling data as code was a crutch we had to work with for lack of tooling and sometimes to increase performance (ie lookup arrays for sin/cos/tan/random) up until into the 90s (roughly) when it eventually faded out of existence thanks to ubiquitous data-loading formats and frameworks, and more horsepower.

So I’m kinda stumped that you would even consider this as an option. :wink:

Well, 3 reasons drove this idea.

  1. Dictionary lookup is O1 runtime.
  2. The data being loaded is permanently static.
  3. Due to the nature of the application, perf is king and we’re pushing the envelope everywhere.

Since the dictionary is O1 lookup I wanted to eliminate it from the loading process. While it’s not huge if we just json parse 3mb, we’re still adding to the load times whereas afaik we could eliminate that by hardcoding the data which is static in practice anyway. Development convenience is a low priority compared to eeking out every inch of runtime performance in this case.

The only roadblock here in the use case is Mono JIT, which I was hoping there might be some attribute we could use or some way to arrage the assembly to help with this but evidently not, so I’ll just explore alternatives and evaluate the impact.

No, that’s a complete misconception. You initialize your Dictionary with a field initializer. Field initializer are not magic. They are also code that is executed. Static readonly fields are essentially initialized in the static constructor of the class. Look at this example code:

    // Inside another class
        public class MyData
        {
            public string a;
            public int b;
            public MyData(string aA, int aB)
            {
                a = aA;
                b = aB;
            }
        }

        public static Dictionary<string, MyData> Data = new Dictionary<string, MyData>()
        {
            {"First",new MyData("blubb",42) },
            {"Second",new MyData("some data",123) },
            {"Third",new MyData("more data",456) }
        };

This would result in this IL code:

generated IL code
	// this is the actual dictionary field
	.field /* 04000021 */ public static class [mscorlib]System.Collections.Generic.Dictionary`2<string, class MainForm/MyData> Data


	// this is the static constructor of the outer class
	.method /* 060000EB */ private hidebysig specialname rtspecialname static 
		void .cctor () cil managed 
	{
		// Method begins at RVA 0x6d84
		// Code size 86 (0x56)
		.maxstack 5

		//     Data = new Dictionary<string, MyData>
		//     {
		//         {
		//             "First",
		//             new MyData("blubb", 42)
		//         },
		//         {
		//             "Second",
		//             new MyData("some data", 123)
		//         },
		//         {
		//             "Third",
		//             new MyData("more data", 456)
		//         }
		//     };
		IL_0000: newobj instance void class [mscorlib]System.Collections.Generic.Dictionary`2<string, class MainForm/MyData>::.ctor() /* 0A000124 */
		IL_0005: dup
		IL_0006: ldstr "First" /* 70000B61 */
		IL_000b: ldstr "blubb" /* 70000B6D */
		IL_0010: ldc.i4.s 42
		IL_0012: newobj instance void MainForm/MyData::.ctor(string, int32) /* 0600013F */
		IL_0017: callvirt instance void class [mscorlib]System.Collections.Generic.Dictionary`2<string, class MainForm/MyData>::Add(!0, !1) /* 0A000125 */
		// (no C# code)
		IL_001c: nop
		IL_001d: dup
		IL_001e: ldstr "Second" /* 70000B79 */
		IL_0023: ldstr "some data" /* 70000B87 */
		IL_0028: ldc.i4.s 123
		IL_002a: newobj instance void MainForm/MyData::.ctor(string, int32) /* 0600013F */
		IL_002f: callvirt instance void class [mscorlib]System.Collections.Generic.Dictionary`2<string, class MainForm/MyData>::Add(!0, !1) /* 0A000125 */
		IL_0034: nop
		IL_0035: dup
		IL_0036: ldstr "Third" /* 70000B9B */
		IL_003b: ldstr "more data" /* 70000BA7 */
		IL_0040: ldc.i4 456
		IL_0045: newobj instance void MainForm/MyData::.ctor(string, int32) /* 0600013F */
		IL_004a: callvirt instance void class [mscorlib]System.Collections.Generic.Dictionary`2<string, class MainForm/MyData>::Add(!0, !1) /* 0A000125 */
		// }
		IL_004f: nop
		IL_0050: stsfld class [mscorlib]System.Collections.Generic.Dictionary`2<string, class MainForm/MyData> MainForm::Data /* 04000021 */
		IL_0055: ret
	} // end of method MainForm::.cctor

This was extracted using ILSpy from the actual compiled code.

As you can see, the initialization of your Dictionary is not magic, but simple linear code that is executed line by line. All the class instances have to be created and added to the dictionary, one by one.

All reference types need to be created on the heap, so there has to be some code which actually creates them. A dictionary is a class that internally usually holds two arrays, one that does the hash referencing and one that actually contains the elements. Objects in OOP needs to be created at some point.

There are rare exceptions, especially when it comes to attributes. Attributes are metadata which is part of the type information. The instances of those classes are loaded from somewhat serialized data as they will appear in memory. Though this also depends on the complexity of the attribute and those aren’t considered “normal” classes anyways as they are part of the reflection / type system.

So assuming that hardcoding data into code somehow makes it load faster it not true.

ps: I actually find it kinda interesting that the generated initialization code does not even set the capacity of the dictionary when it creates the dictionary. That means each Add call will actually make it grow dynamically. So it would create quite a bit of garbage when you have many elements. Though loading the data at runtime from json or csv has essentially the same issue as the number of elements are not known in advance. Though even when we talk about a million entries, this should still be quite fast.

1 Like

I see. That makes sense. I suppose I just assumed that the ctor cost would naturally be lower than parsing string data which was not a good assumption. You also make a great point about the Add function and the dictionary dynamically growing.

In testing the parsing json to the class directly its about 500ms, which maybe I can push to a thread while the other loading occurs.

Thanks for the feedback this clarifies why this isn’t good in practice.

1 Like