Overriding equality stuff confusion

I’ve got myself confused on how to deal with overloading the == operator and what exactly I need to do. If someone could step me through this I would be grateful. I can’t seem to get a good grasp on this from the MSDN docs.

I’ve got some custom classes that I want to store in a hashset. The Node class is basically a wrapper for the MazePosition. I want the equality stuff on the Node class to behave as if it were the MazePosition class. My code looks something like this.

public class Node {
    public static bool operator == (Node a, Node b){
        return a.mazePosition == b.mazePosition;
    }

    public static bool operator != (Node a, Node b){
        return a.mazePosition != b.mazePosition;
    }

    public override bool Equals (object obj)
    {
        Node other = obj as Node;
        if (other == null) return false;
        return this.mazePosition == other.mazePosition;
    }

    public override int GetHashCode ()
    {
        return mazePosition.GetHashCode ();
    }
}

I’m currently getting a null reference exception on line 13/3. I know why, the null check is throwing a null reference error. I just don’t know what to do with it.

It’s probably because its referencing its own == operator but you’re not returning a null exception, you’re returning false.

Maybe try ‘if (!other) return false;’ and see if it does the same thing, then try dealing with the null in your special == operator.

That’s probably the cause of the issue.

The bool operator won’t work because I haven’t overloaded it. That’s a special Unity thing.

Putting the null check into the == operator is the path I looked at. Except I can’t call the == operator to check for null from inside the == operator, this causes an infinite loop.

My head keeps bouncing around in an infinite loop too. I may leave this and come back tomorrow. MSDN is starting to make less and less sense.

Ouch, what a mess. :S

Maybe there is a different equality check that would work instead? Not sure of implications there, though.

Some interesting stuff here on the topic:
http://stackoverflow.com/questions/3005794/overload-and-of-course-operator-can-i-bypass-to-determine-whether

Your problem is that you’re checking if “other” is a null reference by trying to grab instance variables from “other” that doesn’t exist if “other” is null. This will of course always throw a null reference exception when other is null.

The culprit is this line:
return a.mazePosition== b.mazePosition;

If “a” or “b” is null, you’ll try to grab mazePosition from the objects - which you can’t if the object is null => null reference exception.

I’d add something along those lines in the start of your == operator method:

  • if(Object.ReferenceEquals(a, null) || Object.ReferenceEquals(b, null)){
  • return Object.ReferenceEquals(a, b);
  • } else {
  • // Do your comparison.
  • }
3 Likes

Since Node is a reference type, most likely you should not override the == operator. The standard is that == is always a reference check, so you might confuse someone else if you try to change that.

Also, since you are overriding Equals(), don’t forget to override GetHashCode() and implement the IEquatable interface.

2 Likes

As I already mentioned, you normally don’t override the == operator when dealing with reference types. The exception is when you are creating an object which conceptually acts as a value type. Think of a string: this is a reference type but conceptually, we think of it as a value so something like this wouldn’t work if we didn’t override the == operator:

if ("Hello World" == "Hello " + "World")
  //...

Let’s just assume for now that we really do want to override ==. I’ll point out a few things in your code.

public class Node {
    public static bool operator == (Node a, Node b){
        return a.mazePosition == b.mazePosition;
    }
}

This is not going to work if b is null. Whenever you implement binary operators, you need to deal with null inputs. Let’s change this to:

public class Node
{
  public static bool operator == (Node a, Node b)
  {
    // an item is always equal to itself
    if(object.ReferenceEquals(a, b))
      return true;

    // if both a and b were null, we would have already escaped so check if either is null
    if(object.ReferenceEquals(a,null))
      return false;
    if(object.ReferenceEquals(b,null))
      return false;

    // Now that we've made sure we are working with real objects:
    return a.mazePosition == b.mazePosition;
  }
}

Once you’ve worked this operator out, the others are actually quite trivial because you can reuse your code

public static bool operator != (Node a, Node b)
{
  return !(a == b);
}

public override bool Equals(Object obj)
{
  return (obj as Node) == this;
}

Now let’s talk IEquatable…
There are some issues with the object interface Equals(object). First, we don’t have type information, so the compiler loses access to some of its optimization tricks. The deal is worse if we are talking value types because we have to box anything we want to compare. IEquatable fixes this by providing a nicely typed interface but for backward compatibility, we still need to give the object implementation above. Let’s expand our example to include IEquatable.

public class Node : IEquatable<Node>
{
  public static bool operator == (Node a, Node b)
  {
    // an item is always equal to itself
    if(object.ReferenceEquals(a, b))
      return true;

    // if both a and b were null, we would have already escaped so check if either is null
    if(object.ReferenceEquals(a,null))
      return false;
    if(object.ReferenceEquals(b,null))
      return false;

    // Now that we've made sure we are working with real objects:
    return a.mazePosition == b.mazePosition;
  }

  public static bool operator != (Node a, Node b)
  {
    return !(a == b);
  }

  public override bool Equals(Object obj)
  {
    return Equals(obj as Node);
  }

  public bool Equals(Node other)
  {
    return other == this;
  }
}

Now for the super grey area: GetHashCode(). You’ll find all kinds of contradictory advice for this one…

  • GetHashCode should match the Equals algorithm
  • GetHashCode should be faster than the Equals algorithm
  • An objects HashCode should never change after object creation
  • An objects HashCode should always represent its attributes

To be honest, I’m still a bit fuzzy on all this but as far as I understand it, the confusion comes up when you are talking about mutable vs immutable objects. Here are the facts:

  • Equals must be

  • reflexive: an object always equals itself

  • symmetric: if x.Equals(y), then y.Equals(x)

  • transitive: if x.Equals(y) and y.Equals(z) then x.Equals(z)

  • idempotent: a call to Equals must not change the state of either object and must return a consistent value

  • Objects that are equal must have the same HashCode. The opposite is not true: objects with the same HashCode are not necessarily equal.

It’s also helpful to understand how a hash is used. Lets talk about HashSet since it’s obvious this guy uses GetHashCode.

Consider the case when we are looking for an object. If we had 1000 T’s in a big pile (List), we need to do an equality comparison with each object. If the equality comparison involves 100 attributes, that will take: 1000 items × 100 attribute comparisons = 100000 comparisons.

With a HashSet, we instead calculate an int to represent the value of these 100 attributes, and then we compare the integers. So we have 1000 items + 1 item to search × 100 attributes to hash = 100000 operations and then we get to do 100 comparisons. Worse: since an object’s hash does not necessarily mean the objects are equal, we need to do an equals comparison on any matching objects anyways! So hang on a tick… How can doing 100000 operations PLUS 100 comparisons PLUS the equality comparison with n × 100 comparisons (where n is the number of items with colliding hashes) be faster than just doing the 100000 comparisons? I believe this is where the “GetHashCode algorithm should be faster than the Equals algorithm” recommendation comes from. However, there is more to this…

The trick is that a HashSet calculates an object’s hash just once, when it is first added to the set. After that, looking for items is fast because we only need to calculate one hash of the item we are looking for. Then we can do 1000 comparisons instead of 100000 comparisons.

Okay, with all that information, let’s take another look at the “advice” I listed earlier:

  1. GetHashCode should match the Equals algorithm

  2. GetHashCode should be faster than the Equals algorithm
    These sound incompatible but both pieces of advice are good. The truth is that we need to balance our GetHashCode algorithm to achieve two things: a) be fast to calculate and b) generate as few collisions as possible. Remember that in a HashSet, the purpose of an object’s hash is to quickly reduce the number of possible matches so we don’t need to do an expensive Equals comparison on every item. From this point of view, it makes sense to simplify the GetHashCode algorithm so that it can be fast. However, don’t forget that an objects hash is remembered by the HashSet, so it is only calculated once. This is already a big speed boost, so maybe we don’t need to worry about increasing the speed of our GetHashCode algorithm. Besides, we want to make sure we don’t have too many collisions otherwise we lose the benefit of our HashSet. Personally, I would make my GetHashCode algorithm very similar to my Equals algorithm though there is a lot more to generating good hashes than just adding all the properties together. There is an entire field of study related to this and it’s way more than I’m willing to type right now.

  3. An object’s HashCode should never change after object creation

  4. An object’s HashCode should always represent its attributes
    Here’s where things get a bit fuzzy for me. What happens if our objects are mutable and we change some of them after we added them to the HashSet. Is the hash recalculated? How do we find an item if it gets moved around in the HashSet? I’m afraid I can’t answer that off the top of my head. Anyways, add it to the list of reasons to prefer immutable objects.

Yikes! Time to just implement our GetHashCode already…

public class Node : IEquatable<Node>
{
  private readonly MazePosition mazePosition;
  public Node(MazePosition mazePosition)
  {
    this.mazePosition = mazePosition;
  }

  public static bool operator == (Node a, Node b)
  {
    // an item is always equal to itself
    if(object.ReferenceEquals(a, b))
      return true;

    // if both a and b were null, we would have already escaped so check if either is null
    if(object.ReferenceEquals(a,null))
      return false;
    if(object.ReferenceEquals(b,null))
      return false;

    // Now that we've made sure we are working with real objects:
    return a.mazePosition == b.mazePosition;
  }

  public static bool operator != (Node a, Node b)
  {
    return !(a == b);
  }

  public override bool Equals(Object obj)
  {
    return Equals(obj as Node);
  }

  public bool Equals(Node other)
  {
    return other == this;
  }

  public override int GetHashCode()
  {
    // Life is easy when there is only one property!
    // Then again.. what was the point of all this?
    return mazePosition.GetHashCode();
  }
}
1 Like

Personally I wouldn’t even use IEquatable. It’s still indirect, and still relies on the object to handle what it considers equality or not. Anyways, all that happens is that the HashSet will ask EqualityComparer for the default comparer, it’ll recognize the type as an IEquatable, and give the default IEquatable EqualityComparer for that type.

What you could do instead is just define the EqualityComparer yourself:

    public class NodeMazePositionEqualityComparer : IEqualityComparer<Node>
    {

        private static NodeMazePositionEqualityComparer _default;
        public static NodeMazePositionEqualityComparer Default
        {
            get
            {
                if (_default == null) _default = new NodeMazePositionEqualityComparer();
                return _default;
            }
        }


        public bool Equals(Node x, Node y)
        {
            return x.mazePosition == y.mazePosition;
        }

        public int GetHashCode(Node obj)
        {
            return obj.mazePosition.GetHashCode();
        }
    }

I include a static method for a default equality comparer, cause really, you only ever need one in existence (this is the same thing that the DefaultEqualityComparer does internally in .Net).

Than your hashset is simply:

var hashSet = new HashSet<Node>(NodeMazePositionEqualityComparer.Default);

This also means you could create equality comparers for the same object of different significance. I do this at times when I have a dictionary that keys off one property for uniqueness of id (so if say another object with the same value in its Id property is added, it overwrites the old)… but than I might need that same type to be stored in another set where more than one object of the same ‘Id’ can be stored.

I only ever let the object perform ANY equality comparison if at ALL times, no matter what, that its equal based on those parameters. This is very seldom. It might be a type whose internal implementation is constantly in flux, and needs to ensure compatibility across multiple versions of itself. Or requires comparison with values of a different type than itself.

Here’s an example of one, and as I said, it’s a weird case:

2 Likes

This is nice because it separates the concern of equality. However, I’m curious, don’t you still need to test for null arguments? And is there a similar way to separate operator overloads?

Another nice abstraction.

I think I would still implement IEquatable as a matter of completion. If I was new to a framework, I wouldn’t necessarily think to pass an EqualityComparer to the Collection constructors.

1 Like

Yeah, I just quickly slapped that example together, I did forget the null check (I usually am writing these with structs in mind).

There is nothing like this for operator overloading. Operators don’t abstract that way. They’re really just syntax sugar definitions… the compiler finds all static operator functions that fulfill the type signature of the values/objects involved. This included functions that may result from any implicit conversion defined on either of the types. And uses the best found result, just like usual operator overloading.

Do note, the implicit paths can result in 2 functions be determined to be equal in priority, and you get that annoying “ambiguous function definitions” compiler error.

This is why usually you just define operators in the context of the type you’re working with, and then write implicit conversion methods. Which of course only really works for structs. Which just reaffirms the whole operators should only be used with structs thing that you brought up earlier… object == operator should always default back to the default operation of reference testing (which is performed via the Object.Equals function all types can override).

I can feel ya on that. I just have always came about it from the other direction around. I DO presume my collection to access an equality comparer. Instead of expecting the collection to only support objects that implement some function, rather abstract that function to an object (noting a more pure object oriented approach, rather than delegates…).

Lets take for instance you have a set of objects you can’t control the implementation of. It’s from a third party. But you need to compare them in a set by some system. Because you can’t implement such a function. You’d need some other object to delegate the job of comparison to.

Or a non-generic collection (pre .Net 2.0 and , when System.Collections.Generic didn’t exist). Such a collection could hold multiple objects of different types… which type defines what sort of comparison is used? This is actually why all objects have an ‘Equals’ function so that it has a default behaviour if one doesn’t supply a comparer (otherwise such a collection of ints wouldn’t compare correctly as reference equals wouldn’t resolve appropriately).

Furthermore, I think of it with the history of OOP in mind. Interfaces are a new concept, and so are generics/templating. Lets take the quintesential object oriented language that popularized it to the masses… C++. C++ doesn’t have interfaces, and the concept of abstract classes as contracts is more of a design pattern rather than a language standard. And generics/templating is very new (relative to the age of C++). Lets say you created a general pointer collection… it just contains pointers (think like ArrayList or List). It’s a lot of work for a collection to figure out what the thing is actually pointing at, and if it has to cover ALL possibilities, it’s going to have to go through a long chain of commands to do so. Where as if you gave it a comparer, the comparer would define the specific rules of comparison. Maybe it just tests if the pointer is pointing at the same spot in memory, another might compare the size of the objects pointed at in memory, another might check if the first byte of data is equal, another might coerce it into some known type and compare based on that types interface (not to be confused with interface).

Anyways, yeah, in a pure object oriented approach, I almost certainly would expect another object to handle the task of what compares the values if that comparison isn’t anything but a one to one comparison (reference equals, or value match, for class and struct respectively).

2 Likes

Thanks guys.

It seems like object.ReferenceEquals was the key step I was missing in my implementation. But also there is a lot of other stuff I was missing. I think I have a better understanding of the various equals bits and bobs now. So to get value like behaviour on a class I need to:

  • Overload ==
  • Overload !=
  • Override object.Equals(object) and object.Equals (Node)
  • Override object.GetHashCode ()
  • Implement IEquatable
  • Implement IEqualityComparer
  • Use object.RefereceEquals to deal with null checks

There are other properties, I stripped them out as they seemed irrelevant to the question.

I’m overriding == because Node is meant to behave conceptually like a value type. However it can’t be a value type as is requires references to other Node objects. (At least that was my understanding of the error messages I got when I tried making Node a struct). All of the properties of a Node are dependent on the MazePosition, so if two Nodes have the same MazePosition all of their other properties will be identical. The trouble I was encountering was getting this equality acknowledged as I put the Nodes in and out of HashSets.

I’m currently using this for a pathfinding solution. I generate a Node for each MapPosition as it is encountered in my graph search. And then check if a Node for that MapPosition already exists inside one of my HashSets.

Rereading everyone’s posts and after a good nights sleep its dawned on me that a Dictionary<MapPosition, Node> may end up suiting my use case better. That way I can just use the standard equals implementation, and I also don’t have the overhead of generating a new Node if I’ve already done so. I can also go back to my original plan of making a Node a struct by giving it a reference to the MapPosition keyed to another Node.

1 Like

Two clarifying points that jumped out at me:

  • You override Equals(object) but Equals(Node) is part of implementing IEquatable (I screwed this up in my earlier post, so I’m going to fix it)
  • Implementing IEqualityComparer is done on a different class

If you can avoid making these value type behaving reference classes, that sounds like a win to me. I think I like this approach better.

1 Like

No to that bolded one.

The object itself should not be its own IEqualityComparer (although technically it could be).

IEqualityComparer is a way to create a distinct comparison method independent of the type being compared.

2 Likes

Oh, I see. I missed that in your earlier post. I was wondering why both IEquatable and IEqualityComparer existed, now I get it.

I really don’t feel so bad now that I couldn’t figure this out on my own from the docs. In my home field of chemical engineering we don’t have anywhere near as many different versions of equals. :slight_smile: Thanks again.

I think I do too. Its along the lines of my general principle of if I can’t figure out a simple way to do something, I must be doing the wrong thing. Still, its been good for me to understand the various options and implications for equality.

Also, you can throw out the whole IEquatble part, it’s an object, it’s not really necessary. The default ‘Equals(object obj)’ method can suffice.

And no, you don’t have to use object.ReferenceEquals to perform null checks, since you can test for null in the == and != operator overloads.

1 Like

Got ya. But within the == overload I would be null testing with object.ReferenceEquals?

yes

1 Like

So reporting back in. Set up the null checks in == and it works a charm. Thanks everybody.

I’ll go back and do the dictionary thing later during optimisation. For now pathfinding is working and I can move onto my next mechanic!

1 Like