HashSet.Add(0) (and only 0?), when already exists, causes garbage?

So when I was checking my profiler, I realized something was causing lots of garbage and after some debugging, it seems that when, and possibly only when, adding 0 to a HashSet that already contains 0 will cause 20B (im guessing B means byte, but who really knows the intention of the creator).
That is a big issue… (at least for me in this case)
It is only if it already contains the 0 already since when doing hash.Clear() and then hash.Add(0) there is no garbage.
I tested it on a few numbers and I only see the issue with number 0.
Edit - Apparently int.MinValue also causes garbage? MaxValue doesnt though. This makes me worried that there are more values that will cause garbage, but I am hoping that 0 and int.MinValue are just special somehow. How would I even test this?
Edit 2 - I did a small test by first adding numbers 0 - 9999998 (and negatives) and then trying to add the same numbers again in update. I only saw 20B being allocated, which tells me that only 1 thing was causing garbage which was 0. Since thats large enough for me, Ill only worry about 0, but maybe 0 and the int.MinValue are just special and produce garbage.

Here is the code causing garbage

using UnityEngine;
using System.Collections.Generic;

public class TestHashContains : MonoBehaviour
{
    HashSet<int> hash = new HashSet<int>();

    void Update()
    {
        for(int i = 0; i < 1000; i++)
        {
            hash.Add(0);
        }
    }
}

And here is the profiler

It seems the issue is the SlotsContainsAt (called AddIfNotPresent in .Net) method which checks whether or not the item already exists.

Here is the Mono code that I found online (not sure if same as unity)
Click for code

bool SlotsContainsAt (int index, int hash, T item)
{
    int current = table [index] - 1;
    while (current != NO_SLOT) {
        Link link = links [current];
        if (link.HashCode == hash && ((hash == HASH_FLAG && (item == null || null == slots [current])) ? (item == null && null == slots [current]) : comparer.Equals (item, slots [current])))
            return true;

        current = link.Next;
    }

    return false;
}

This might just be a C# Mono thing.
Here is the sourcecode for HashSet in .Net and Mono
.Net: HashSet.cs source code in C# .NET
Mono: HashSet.cs | searchcode

Got any ideas?
Anyone possibly know any workarounds? Is this bug reporting worthy? Do they take bug reports for Mono stuff?

Edit - submitted a bug report

Hmm, it seems to have to do with the hashcode being 0

For example…

using UnityEngine;
using System.Collections.Generic;

public class TestHashContains : MonoBehaviour
{
    HashSet<int> hash = new HashSet<int>(IntComparer.defaultComparer);

    void Update()
    {
        for(int i = 0; i < 1000; i++)
        {
            hash.Add(1);
        }
    }
}

public class IntComparer : IEqualityComparer<int>
{
    public static IntComparer defaultComparer = new IntComparer();

    public bool Equals(int x, int y)
    {
        return x == y;
    }

    public int GetHashCode(int value)
    {
        if(value == 0) return 1;
        if(value == 1) return 0;

        return value;
    }
}

With the custom IEqualityComparer, if I swap 0 and 1 hashcodes, then hash.Add(0) no longer causes garbage, but hash.Add(1) is now what is causing garbage.

I guess I can use this to my advantage for a workaround. Since in my case I am checking triangle indices, there will be no negative numbers, so I can make my GetHashCode method like this and all should be fine…

    public int GetHashCode(int value)
    {
        if(value == 0) return -1;

        return value;
    }

If I am missing something, please let me know.

Edit - I was going to use int.MinValue instead of -1, but apparently int.MinValue also causes garbage? MaxValue doesnt though. This makes me worried that there are more values that will cause garbage, but I am hoping that 0 and int.MinValue are just special somehow. How would I even test this?
Edit 2 - I did a small test by first adding numbers 0 - 9999998 (and negatives) and then trying to add the same numbers again in update. I only saw 20B being allocated, which tells me that only 1 thing was causing garbage which was 0. Since thats large enough for me, Ill only worry about 0, but maybe 0 and the int.MinValue are just special and produce garbage.

This is strange, but Unity’s old Mono version has a bunch of bugs where things that shouldn’t create garbage does.

Should you report it? Probably, but lately, all my bug reports about Mono issues this gets a “We’re postponing this bug until after Mono runtime will be upgraded” reply. If they’re postponing things like the compiler crashing on valid code (which they’re doing), they’re probably going to postpone a bit of garbage.

Report it anyway, if the bug’s around after the upgrade, it’ll probably be handled.

1 Like

If you look at System.Core dll (where HashSet is located) and check the ‘Add’ method you find this:

    /// <summary>
    /// Adds the specified element to a set.
    /// </summary>
    ///
    /// <returns>
    /// true if the element is added to the <see cref="T:System.Collections.Generic.HashSet`1"/> object; false if the element is already present.
    /// </returns>
    /// <param name="item">The element to add to the set.</param>
    public bool Add(T item)
    {
      int itemHashCode = this.GetItemHashCode(item);
      int index1 = (itemHashCode & int.MaxValue) % this.table.Length;
      if (this.SlotsContainsAt(index1, itemHashCode, item))
        return false;
      if (++this.count > this.threshold)
      {
        this.Resize();
        index1 = (itemHashCode & int.MaxValue) % this.table.Length;
      }
      int index2 = this.empty_slot;
      if (index2 == -1)
        index2 = this.touched++;
      else
        this.empty_slot = this.links[index2].Next;
      this.links[index2].HashCode = itemHashCode;
      this.links[index2].Next = this.table[index1] - 1;
      this.table[index1] = index2 + 1;
      this.slots[index2] = item;
      ++this.generation;
      return true;
    }

So the first thing it does is basically check if ‘item’ is already a member of the HashSet. It does this by getting the hashcode, then checking if it is in any slots.

    private int GetItemHashCode(T item)
    {
      if ((object) item == null)
        return int.MinValue;
      return this.comparer.GetHashCode(item) | int.MinValue;
    }

    private bool SlotsContainsAt(int index, int hash, T item)
    {
      HashSet<T>.Link link;
      for (int index1 = this.table[index] - 1; index1 != -1; index1 = link.Next)
      {
        link = this.links[index1];
        if (link.HashCode == hash && (hash != int.MinValue || (object) item != null && (object) this.slots[index1] != null ? (this.comparer.Equals(item, this.slots[index1]) ? 1 : 0) : ((object) item != null ? 0 : (null == (object) this.slots[index1] ? 1 : 0))) != 0)
          return true;
      }
      return false;
    }

You’ll notice both of this code does a lot of ‘object’ casting to test if null (the class is generic, so it doesn’t know if it’s dealing with a ref type or a value type).

This is where your garbage is coming from.

Although you should be able to compare a generic like that to null and the compiler optimize that without having to cast… but I’m betting this old version of mono didn’t have that yet or something.

Would you look at that, a bad implementation that’s not present in the latest Mono version!

3 Likes

lol

Im a bit confused, if the garbage is coming from the object cast, shouldnt I always see garbage happening? Why is there only garbage for 0 and int.MinValue (might be more, but I think its just those 2).
Is it because of that if statement within the SlotsContainsAt method? Since its using int.MinValue and 0 for comparing stuff, extra code ends up being ran which causes the garbage?

I’ll also make the bug report when I get to my computer later (on tablet atm) and ill post it here too. At least theres a workaround though.

Honestly, I can’t tell you exactly where or what, as I’m not at a computer with Unity right now.

I’m willing to bet it might happen with other numbers as well, it really hinges on the hash that is calculated.