As I already mentioned, you normally don’t override the == operator when dealing with reference types. The exception is when you are creating an object which conceptually acts as a value type. Think of a string: this is a reference type but conceptually, we think of it as a value so something like this wouldn’t work if we didn’t override the == operator:
if ("Hello World" == "Hello " + "World")
//...
Let’s just assume for now that we really do want to override ==. I’ll point out a few things in your code.
public class Node {
public static bool operator == (Node a, Node b){
return a.mazePosition == b.mazePosition;
}
}
This is not going to work if b is null. Whenever you implement binary operators, you need to deal with null inputs. Let’s change this to:
public class Node
{
public static bool operator == (Node a, Node b)
{
// an item is always equal to itself
if(object.ReferenceEquals(a, b))
return true;
// if both a and b were null, we would have already escaped so check if either is null
if(object.ReferenceEquals(a,null))
return false;
if(object.ReferenceEquals(b,null))
return false;
// Now that we've made sure we are working with real objects:
return a.mazePosition == b.mazePosition;
}
}
Once you’ve worked this operator out, the others are actually quite trivial because you can reuse your code
public static bool operator != (Node a, Node b)
{
return !(a == b);
}
public override bool Equals(Object obj)
{
return (obj as Node) == this;
}
Now let’s talk IEquatable…
There are some issues with the object interface Equals(object). First, we don’t have type information, so the compiler loses access to some of its optimization tricks. The deal is worse if we are talking value types because we have to box anything we want to compare. IEquatable fixes this by providing a nicely typed interface but for backward compatibility, we still need to give the object implementation above. Let’s expand our example to include IEquatable.
public class Node : IEquatable<Node>
{
public static bool operator == (Node a, Node b)
{
// an item is always equal to itself
if(object.ReferenceEquals(a, b))
return true;
// if both a and b were null, we would have already escaped so check if either is null
if(object.ReferenceEquals(a,null))
return false;
if(object.ReferenceEquals(b,null))
return false;
// Now that we've made sure we are working with real objects:
return a.mazePosition == b.mazePosition;
}
public static bool operator != (Node a, Node b)
{
return !(a == b);
}
public override bool Equals(Object obj)
{
return Equals(obj as Node);
}
public bool Equals(Node other)
{
return other == this;
}
}
Now for the super grey area: GetHashCode(). You’ll find all kinds of contradictory advice for this one…
- GetHashCode should match the Equals algorithm
- GetHashCode should be faster than the Equals algorithm
- An objects HashCode should never change after object creation
- An objects HashCode should always represent its attributes
To be honest, I’m still a bit fuzzy on all this but as far as I understand it, the confusion comes up when you are talking about mutable vs immutable objects. Here are the facts:
-
Equals must be
-
reflexive: an object always equals itself
-
symmetric: if x.Equals(y), then y.Equals(x)
-
transitive: if x.Equals(y) and y.Equals(z) then x.Equals(z)
-
idempotent: a call to Equals must not change the state of either object and must return a consistent value
-
Objects that are equal must have the same HashCode. The opposite is not true: objects with the same HashCode are not necessarily equal.
It’s also helpful to understand how a hash is used. Lets talk about HashSet since it’s obvious this guy uses GetHashCode.
Consider the case when we are looking for an object. If we had 1000 T’s in a big pile (List), we need to do an equality comparison with each object. If the equality comparison involves 100 attributes, that will take: 1000 items × 100 attribute comparisons = 100000 comparisons.
With a HashSet, we instead calculate an int to represent the value of these 100 attributes, and then we compare the integers. So we have 1000 items + 1 item to search × 100 attributes to hash = 100000 operations and then we get to do 100 comparisons. Worse: since an object’s hash does not necessarily mean the objects are equal, we need to do an equals comparison on any matching objects anyways! So hang on a tick… How can doing 100000 operations PLUS 100 comparisons PLUS the equality comparison with n × 100 comparisons (where n is the number of items with colliding hashes) be faster than just doing the 100000 comparisons? I believe this is where the “GetHashCode algorithm should be faster than the Equals algorithm” recommendation comes from. However, there is more to this…
The trick is that a HashSet calculates an object’s hash just once, when it is first added to the set. After that, looking for items is fast because we only need to calculate one hash of the item we are looking for. Then we can do 1000 comparisons instead of 100000 comparisons.
Okay, with all that information, let’s take another look at the “advice” I listed earlier:
-
GetHashCode should match the Equals algorithm
-
GetHashCode should be faster than the Equals algorithm
These sound incompatible but both pieces of advice are good. The truth is that we need to balance our GetHashCode algorithm to achieve two things: a) be fast to calculate and b) generate as few collisions as possible. Remember that in a HashSet, the purpose of an object’s hash is to quickly reduce the number of possible matches so we don’t need to do an expensive Equals comparison on every item. From this point of view, it makes sense to simplify the GetHashCode algorithm so that it can be fast. However, don’t forget that an objects hash is remembered by the HashSet, so it is only calculated once. This is already a big speed boost, so maybe we don’t need to worry about increasing the speed of our GetHashCode algorithm. Besides, we want to make sure we don’t have too many collisions otherwise we lose the benefit of our HashSet. Personally, I would make my GetHashCode algorithm very similar to my Equals algorithm though there is a lot more to generating good hashes than just adding all the properties together. There is an entire field of study related to this and it’s way more than I’m willing to type right now.
-
An object’s HashCode should never change after object creation
-
An object’s HashCode should always represent its attributes
Here’s where things get a bit fuzzy for me. What happens if our objects are mutable and we change some of them after we added them to the HashSet. Is the hash recalculated? How do we find an item if it gets moved around in the HashSet? I’m afraid I can’t answer that off the top of my head. Anyways, add it to the list of reasons to prefer immutable objects.
Yikes! Time to just implement our GetHashCode already…
public class Node : IEquatable<Node>
{
private readonly MazePosition mazePosition;
public Node(MazePosition mazePosition)
{
this.mazePosition = mazePosition;
}
public static bool operator == (Node a, Node b)
{
// an item is always equal to itself
if(object.ReferenceEquals(a, b))
return true;
// if both a and b were null, we would have already escaped so check if either is null
if(object.ReferenceEquals(a,null))
return false;
if(object.ReferenceEquals(b,null))
return false;
// Now that we've made sure we are working with real objects:
return a.mazePosition == b.mazePosition;
}
public static bool operator != (Node a, Node b)
{
return !(a == b);
}
public override bool Equals(Object obj)
{
return Equals(obj as Node);
}
public bool Equals(Node other)
{
return other == this;
}
public override int GetHashCode()
{
// Life is easy when there is only one property!
// Then again.. what was the point of all this?
return mazePosition.GetHashCode();
}
}