Comparing two lists with duplicates?

I have two lists that hold a scriptableobject type (ItemData).

I want to check if they contain the same elements of ItemData. Order does not matter, however duplicates does matter.

Example 1
List 1: Apple, Banana, Apple
List 2: Banana, Apple, Apple

Ideally, this comparison would be true.

Example 2:
List 1: Apple, Banana
List 2: Apple, Banana, Banana

Ideally, this comparison would be false.

I’m looking for a clean way to make sure the lists match up (again, order doesn’t matter) and it supports duplicates. Would anyone have any suggestions?

I’ve tried a few things, such as List.Except but it doesn’t take factor in duplicates.

One way to do this would be to sort both lists, and then iterate them in parallel to make sure that they match at every index. (If you need to preserve the original order, you could potentially copy both lists, sort the copies, and then compare those. Of course, depending on the size of the lists, this may be expensive in memory.)

If the items in the list are drawn from a small domain that is known in advance, you could just count the number of each unique thing in each list, then make sure the counts match. (In fact, you can just count upwards with one list and count downwards with the other, rather than making separate counts for each.)

Regardless of how you do this, you may want to start by comparing the lengths of the lists, since if the lengths don’t match then you know immediately the lists don’t match either, and you won’t have to perform more-expensive checks.

1 Like

This problem can essentially be reduced to the anagram problem. I.e.:

Given two strings, return true if the two strings contain the same characters, where order doesn’t matter. For example “rat” and “art” are anagrams, but “tart” and “rat” are not.

The typical solution to that problem is to sort the strings and compare. So “art” and “rat” become “art”, and “tart” becomes “artt”.

So if you take that problem, and just replace “characters in a string” with “ScriptableObjects in a list”, it’s the same problem and the same solution can be used. In this case I’d say sort the lists by the ScriptableObjects’ GetInstanceId(), and then simply compare the sorted lists using List.SequenceEqual(). All of this is pretty much what @Antistone said :smile:

1 Like

Thanks for your input, @Antistone and @PraetorBlue !

Here’s what I ended up doing… Not sure what else to do about that foreach loop (I hate it, but it’s basically a list containing scriptableobjects that contain their own lists… those lists are what I’m ultimately comparing against)

Feedback is more than welcome!

_itemsToCombine = _itemsToCombine.OrderBy(x =>x.itemName).ToList();

foreach(var recipe in _craftData) 
{
  if (recipe.ingredients.Count != _itemsToCombine.Count) continue;

  var isEqual = _itemsToCombine.
  SequenceEqual(recipe.ingredients.
  OrderBy(x =>x.itemName));

  if (!isEqual) continue;
  Debug.Log(recipe.itemCrafted.itemName);
  break;
}

Another way is GroupBy. If you only need to check for each unique element how often it is in the list, you can easily get that as well, see the answers to LINQ with groupby and count.

1 Like

Okay, so I actually got kind of crafty. I use Odin for my recipe editor.
I was able to use a couple of its attributes to detect when they list is changed and I auto-sort it by name from there. Then I don’t have to do it at runtime.

I’m not editing the recipes themselves at runtime, so this is okay… It saves an OrderBy each time a combination is processed.

So I could literally do:

if (_itemsToCombine.SequenceEqual(recipe.ingredients))
{
   //do stuff
}

Thank you for the help everyone :slight_smile:

1 Like

If your list of recipes doesn’t change, you could invent some sort of hash algorithm over lists of ingredients and then store the recipes in a hash table (such as a Dictionary).

That’s actually a really good point. I’m going to explore this more when I get back to my desk. Thank you!

You would still need to sort the results, no? (Otherwise, you could end up with “2 Apples, 1 Banana” vs “1 Banana, 2 Apples”.)

1 Like

In the original post, it says that order doesn’t matter, only duplicates does, so there is no need to sort the results. What one does have to do is for each group from one grouping for one list, find the corresponding group from the grouping of the other list.

Any how exactly is that faster or better for performance? How would cou compare the “groups” which could have variable length and in random order? This is almost the same as the original problem. Sorting with quick sort would probably be way faster in the end.

The fact that order doesn’t matter is why you DO sort the results. If order did matter, then you couldn’t sort, because it would change the answer! But since order doesn’t matter, sorting is both simpler and faster than the alternative; it avoids searching the entire second list for every item in the first list, which would be O(n^2), and replaces it with O(n log n).

2 Likes

More on that, sorting is one of the major tricks in solving many list problems. O(nlogn) is crazy fast. For 1,000 items, nlogn is roughly 10,000 steps, which isn’t much more than walking through the list a single time. Compare that to a nested loop, which is a million steps – sort-it-first methods runs in 1% of that time (since everything you do after sorting is just a quick single loop).

Even if you don’t want to sort the actual list, making an extra list with just references is relatively cheap, space-wise. And C#'s built-in sort and the “new” lambda functions makes it easy enough to sort however you want. And passing functions in C# is about the same as in every other language and is a good trick to know.

And sorting almost always makes problems soo easy to solve. For any list problem that takes more than a single loop “sort it first” is the standard thing to try.