JSON changes field to ref

Hey,

We have written a standalone editor application for of our products, in which users can create questions with matching answers (correct of incorrect). When users are editing this data, we save the progress to a text file in JSON format, so user can continue where they left off.

So the basic setup for a question is this:

public record TextQuestionData : QuestionData
{
		[SerializeField] private string question = string.Empty;
		[SerializeField] private string info = string.Empty;
		[SerializeField] private List<StringAnswerData> correctAnswers = new List<StringAnswerData>();
		[SerializeField] private List<StringAnswerData> incorrectAnswers = new List<StringAnswerData>();
}

And the StringAnswerData looks like this:

public record StringAnswerData : AnswerData
{
		[SerializeField] private string answer = string.Empty;
}

In both cases, the data it inherits from has some additional data (AnswerData has nothing yet) which in turn inherit from BaseData, that allows us to monitor property changes.

In case it matters:
	[Serializable]
	public record BaseData : IBindableDataType, INotifyPropertyChanged
	{
		public event PropertyChangedEventHandler PropertyChanged;

		protected BaseData() { }

		protected void OnPropertyChanged(string propertyName) => PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(propertyName));

		protected bool SetField<T>(ref T field, T value, bool skipCompare = false, [CallerMemberName] string propertyName = "")
		{
			if (!skipCompare && EqualityComparer<T>.Default.Equals(field, value))
			{
				return false;
			}

			field = value;
			OnPropertyChanged(propertyName);

			return true;
		}
	}

Now, for the issue: We noticed in some cases, that when some of the answers were adjusted, some of the other values changed with it. After some digging around in code, checking missed sub/unsubs, I found the issue was likely with how the data was serialized and deserialized.

For example, this is how one of the questions (in JSON) looks:

"correctAnswers": {
      "_list": [
       {
            "$id": 2,
            "answer": "1"
       },
       {
             "$id": 3,
             "answer": "1"
       }
       ]
       },
"incorrectAnswers": {
     "_list": [
     {
           "$ref": 2
     },
     {
           "$ref": 3
     }
      ]
},

So for some reason (without my know how) sometimes the answers are changed to a reference to another answer.

We are using the Serialization Unity Package (v 3.1.2) to write and read to and from.

Writing is simply:

string jsonData = JsonSerialization.ToJson(SelectedProject);

Reading is simply:

projectData = JsonSerialization.FromJson<ProjectData>(data);

Also probably worth noting that it does’t happen all the time. I have not been able to figure out yet how this is happening.

If anyone is able to shed some light on how this can happen, and more importantly, how I can prevent it. that would be great.

Likely it serialises as a reference because… the data is a reference to an existing object during serialisation?

If you’re passing around references like this but want separate instances, you either need to make sure you’re creating new instances, or provide a configuration/settings object for serialisation that in-lines the data, rather than serialising by-reference. For some serialisation libraries, by-reference serialisation is the default/norm.

Yes, I was looking into the serialization parameters just now and noticed the DisableSerializedReferences property.

Every answers is a new instance though, so it shouldn’t happen I’d assume.

However, we are using UI Toolkit’s ListView and now I am wondering if something goes wrong with pooling… Going to check it out.

I think I have found the issue!

So, the StringAnswer class at the moment is just a string.

When I have a list of items with the same string value, the first time I write the json, it does it as I would expect.

However, now I deserialize the entire project again (structure is basically game → list of questions → list of answers)

If I never open the screen with a question (see I don’t “see” the data) and the data never gets populated in the fields, the only reference (in memory I guess?) are the list with answers that all contain the same string. I guess the serializer sees that as the same instance/reference and makes it a ref accordingly.

Hope that makes sense…

I dont know how you are using “Records”, i thought these were not available yet on Unity.
image

Your problem is that, a “record” is treated as a mix of referenceType/valueType. This means that even though is a reference type (it is allowed to inherit from another record, and is allocated on the heap), its Equals() & GetHashCode() methods are overriden so that any 2 “record” instances that have the same values on their fields returns the same HashCode and Equals == true.

The serializer thinks that 2 instances with the same “answer” field, are the same (basically it maps every instance at the beggining by adding to a dictionary, and the dictionary for the 2nd and onward “equal” records is saying that it already has that value), so it creates a reference. So you either shouldnt use records, or you should add a distinctive field (like “id”) to your base “AnswerData” types,

From Records - C# reference | Microsoft Learn

If you don’t override or replace equality methods, the type you declare governs how equality is defined:

  • For class types, two objects are equal if they refer to the same object in memory.
  • For struct types, two objects are equal if they are of the same type and store the same values.
  • For types with the record modifier (record class, record struct, and readonly record struct), >two objects are equal if they are of the same type and store the same values.
2 Likes

They were added in C# 9.0, which is the language version Unity supports up to.

Not that Unity can serialise them, but other libraries probably can.

2 Likes

Thanks for the clarification. I am aware of how records work. It’s exactly for this reason I chose to use records, in the context for the application I am working on. I also wanted to be able to inherit some of the data structures, so records seemed like the perfect fit.

I didn’t have this sharp in mind, but it makes sense when it comes to serializing the data. But there is still something that doesn’t really make sense to me:

Say I have 2 different questions (and a question has a list of answers, both records). So lets say it looks like this:

Question 1
answer[0] = “1”
answer[1] = “1”

Question 2
answer[0] = “2”
answer[1] = “2”

According to what you said, I would expect the answers to become refs as for both questions, the answers inside them are equal, right?

When I check the JSON, it isn’t. They’re both not refs and it’s like they’re treated as a class.

Context is probably needed here: In our application, we have a screen where a question is visualized, along with the answers.

Now here comes the added confusion. So now when I deserialize the data and I don’t view the questions (and with that, the answers) and I serialize them again, they do become refs, which is the behaviour I expect to happen.

However
When I do view a question. The specific observed question/answers do not become refs.

So essentially, when the data is observed after the first time it is deserialized, it behaves differently than when it is not observed.

While I now know how to fix my “issue”, I still want to know what’s causing this behaviour. Does the Unity Serializer package not know how to handle records properly? Does it do something under the hood that makes sense in this regard?

Ok this is leaving me a little confused as well. AFAIK the JsonSerializer works like this (when serializedReferences are enabled):

  • It makes a pre-pass through SerializedReferencesVisitor: this will visit the full object graph that has been passed to serialize (ProjectData): this pre-pass maps every unique instance in your graph (as long as it is on a field/property that is visible to Unity.Properties, so public fields or fields marked with [SerializedField]/[SerializeReference] or fields/properties marked with [CreateProperty]). This pre-pass doesnt check for attributes like [NonSerialized] or [DontSerialize], so it will visit them regardless.
  • When it finds more than 1 occurence of a referenceType, it assigns an “Id”
  • Then real serialization starts, and if no adapter is provided for the type being serialized, then it will get serialized by default. This is when it checks the “metadata” from the pre-pass. The first time, if the current instance has an “id” from the pre-pass, it will serialize inline AND add the $id field. For further appareances of the same instance (or “equal” in terms of a EqualityComparer<T>.Default), it will just write the “$ref” property, instead of inlining the data.

Obviously this is not true for valueTypes, and the way it differentiates between both is

bool isReferenceType = !TypeTraits<TValue>.IsValueType; // TypeTraits<T> defines IsValueType as type.IsValueType

if (isReferenceType)
{
    if (null == value)
        return;
    // this is re-checked in case TValue was an interface type
    isReferenceType = !value.GetType().IsValueType;
}

So AFAIK there is nothing related to “records” that would allow the serializer to identify one.

After this reference check (this is on the pre-pass), types are casted to object and added to a HashSet, which also should just use the default EqualityComparer<object> (which should just rely on the default equality behaviour for records), and then to a Dictionary (to assign an ID) if on the 2nd appareance of the instance (in terms of equality).

When you say “observed” (schrödinger answers :stuck_out_tongue: ) what do you mean exactly? What im understanding is that your data is always serialized, and through a inspector-like tool you deserialize to inspect it.

So, im currently at a loss as to the “exact” behaviour you are experiencing xD

Anyways, through explaining this i remembered that if you provided a custom IJsonAdapter<T> for your Answer types, you can prevent it being serialized by reference, so maybe this helps you to just keep them being records and without some ID field, and just use the Adapter so they dont become refs.

1 Like

Thanks for the explanation! Never really took the time to find out how it actually works, so this is some great insight! :slight_smile: Much appreciated.

Observing in this case is seeing the data in the application. In our application you essentially make a game, in which you can define rounds which have questions and their corresponding (in)correct answers.

The game, A round and the questions each have their own screen with their own fields, corresponding to the data structure.

In order for the user to stop and continue where they left off, we (de)serialize the data as appropriate.

So for example, when I make a new game, I enter the game screen. I select a round in which I want to make the questions and go to that screen. In that screen is a list (we use UI Toolkit and a ListView is used here to populate the questions) with all the questions so far. For a new game it is empty still.

In this screen, I add a question (or more). I can then select a question to fill in the details and I go to the question screen.

In this screen there are all the specific data that make up the question: A string with the actual question and 2 lists (both listviews) with the correct and incorrect answers.

So as long as I have actually seen the question data (which includes the answers) in the question screen, when I serialize the data, the answers (even if they’re equal) are not serialized as refs.

Now if I close the application (or select a new game, which is where it will deserialize the data) and open a game, every question (and thus, the answers) I have seen will not be serialized as a ref. However, all the questions/answers I have not seen in the question screen, will be serialized as a reference.

:exploding_head:

I can’t help but think this is a bug, especially after your explanation. It should always be a ref when they’re equal, but it is definitely not happening for me.

Right now we’re simply disabling serialized references all together, as we don’t want it for any of the data, by adding it as a parameter when serializing data:

JsonSerializationParameters parameters = new JsonSerializationParameters()
{
		DisableSerializedReferences = true
	};

string jsonData = JsonSerialization.ToJson(SelectedProject, parameters);
1 Like

To be honest you shouldn’t need to serialize all the answers and questions to make up this data, but instead some ID/GUID that points to each question/answer that has been used so far. There’s no need to actually serialise the data of the answers and questions itself; it’s all unnecessary data.

Well, that heavily depends on how you want to use that data really. In our case, the data structure is pretty simple and it will not be a lot of data (limits exist). So for the functionality so far, it wouldn’t make much sense or a difference, except that I have to manage extra data.

Initially I had set it up like you mentioned, but that also required a lot of bookkeeping and checking.

The system is still in place, as we will need it in the feature when we are going to support media types and allow the user to make pre-defined lists of answers that can be assigned.

All the more reason to use it now and iron out all the kinks.

I tried to give it some thought but really cannot see how this would happen. Only thing that comes to mind is that maybe you have “DisableSerializedReferences” on in some place, but off in another?

Still this wouldnt make sense to me:

// from your first post
"_list": [
{
    "$id": 2,
    "answer": "1"
},
{
    "$id": 3,
    "answer": "1"
}

It should always have been first with $id and second with $ref (assuming they both are a record StringAnswerData).

So it may be a bug, a weird edge case, or some thing we are not seeing xD

At least you already have workarounds

It doesn’t even do that. There is no id or ref to begin with. It behaves as if it were all unique classes.

So like this:

"_list": [
{
    "answer": "1"
},
{
    "answer": "1"
}

The more I think about it, with the insight you have given, the more I think it might actually be a bug.

Might be worth submitting a bug and see what Unity devs have to say about it.

1 Like

I agree, although to check if it really is a bug i would try a minimal repro with just a List of dictinct records with the same data. And try to see if it reproduces.

My 2 cents are that it “wont” reproduce. Because i have actually been through the source quite enough (i do use a modified version for my personal use case, but im familiar with the original) and i would really like to believe that there is some other thing going on at some point on your end.

But this is just me & my pride speculating :stuck_out_tongue: , i would like to think i would’ve caught this xD

1 Like

Yeah, I have a special repro project for bug submission.

I’ll post an update here if I learn something. :slight_smile:

Thanks again for all the information and help, much appreciated!

1 Like

records are primarily ment to be immutable. You can read this blog article about the differences between classes and records.

Since they are usually considered immutable and by default have a value type like comparison even though they are actually reference types, most serializers would probably consider instances that are equal to be the same.

You do mutate your instances so of course you run into issue when the deserialized data essentially point to the same data. This is somewhat similar to strings. Strings are immutable reference types. Strings can be “interned” so when you have two strings with the exact same content, after deserialization when interning is used / supported they may actually reference the one singular instance. This is usually irrelevent as the immutability is essentially enforced for strings unless you use unsafe code,

So I would say since you intend to mutate the instances, you should not use records in the first place. Or if you really want to use records, you should probably use them like structs and actually replace them, maybe with the in the article mentioned with keyword.

You said that a record seemed like the perfect solution. What exact properties of the record do you have in mind that makes it a perfect fit? I just see many issues and reasons why not to use it in your case :slight_smile:

2 Likes

Why they seem(ed) perfect is because all the properties and functionality it describes, fit my use case:

  • They’re immutable (more on that below)
  • Support inheritance
  • Only holds data (no logic)
  • Equality comparison

So, the immutable part is in the current state of the project a bit of a lost purpose really. The thought behind it is that other developers should not be able to (unbeknownst) adjust the data directly, but through a command (support for undo/redo and other bookkeeping stuff under the hood). During development and time constraints (deadline) this purpose got a bit lost and the whole purpose of having them immutable is not really valid as of now. However, this is still something that will change in the near future, where immutability will be relevant again.

Inheritance is more something that makes my life easier. We have different kinds of questions and answers that also share plenty of base data.

Equality comparison is probably the one that was the most important one, with the above reason I chose records over structs. I need the data to be equal if the fields are the same, while still having different instances.

I can make it work with classes, but then I would have to implement functionality where I have to compare if 2 separate instances are equal.

Essentially, I could’ve used classes, structs or records, but neither of them would’ve “worked out of the box” and with the intention I have in mind, records seem to be the better direction with where the project is going to be headed.

That being said, while I might be using records incorrectly, it still doesn’t explain the behavior I’m seeing while serializing the data.

Have you tried using another serialization library? Newtonsoft.Json or the Odin Serializer perhaps.

Personally I feel like this issue stems from serialising data that’s completely unnecessary anyway, which is creating more issues, as you need to ensure your data is designed around being serialised for save games in the first place. And the use of records only compounds that; probably considering Unity doesn’t natively support serialising them for the inspector so it’s likely they aren’t properly supported by it’s serialisation library either.

I can’t actually think of a single bit of Unity’s API, public or internal that uses records.

A simple ID and look-up system would be so much more straight forward. This is how most save systems work for things like inventory systems as well so it’s tried and tested territory.

No, I first want to see if I can reproduce it and if I can, submit a bug.

I don’t think using records (in terms of serializing them at least) as explained by Canjino a few posts up, should make any difference, but I might be wrong.

While in my case I may take on another approach with my data, it doesn’t invalidate the issue itself.