What's the best collection type for filtering through the data base?

Hello, people!

I’ve read somewhere that there are more collection types in C# than grains of sand on any given beach, and that 120% of the time you don’t need to write your own.

So, I’d love to be pointed towards a collection type that suits my needs. Let me elaborate.

Let’s say I have a simple Employee class:

public class Employee {
    public string name;
    public Sector sector;
    public int salary;
    public bool isFreelancer;
}

public enum Sector {
    HumanResources,
    Marketing,
    Administration,
    Sales,
    Art,
    SoftwareEngineering
}

I want a collection that contains those employees and can return me a list of employees with a specific criteria without having to loop through the entire collection. Say I want all employees that are free lancers or maybe all employees that work at Sales. Bonus if it can combine multiple criteria (is free lancer and works at Art, for example).

Does such a collection exist or should I write my own?

How much data do you have? Are you looking for a blazing fast efficient solution, or are you just looking for something to make your life as a programmer easier?

Depending on these answers the correct tool could be anything from a regular old List and the use of LINQ extensions like Select() and Where(), to a SQL database with field indexing etc…

1 Like

It’s going to be something in the hundreds of entries, and I’m more concerned about speed than ease of use for me, specifically because I want to implement a system where the user can type @ anywhere on the descriptive fields and a list of possible “links” will show in a suggestion box.

Hundreds of entries is not a lot. You can probably just get away with using a List and Linq. If I were you I’d try that first and see if the performance is good enough. That will by far be the simplest system to write.

1 Like

Let me expand on the “suggestion” system:

It’s going to be a suggestion box that will contain a collection of entries that will filter and rearrange itself based on what the user types after the @.
Let’s say the user types “@”, a list of every possible employee appears, and those that work on the same Sector as the user is working at the moment will appear at the top, then the user types “a”, becoming “@a”, now those that start with a and work in the same Sector come first, then those that start with a but work in another sector, then those that have “a” anywhere on their name. Those that don’t match any of those criteria disappear from the list.

Indeed it’s not a lot of entries. Could List + Linq achieve a smooth performance on a mobile platform with the mentioned “suggestion” system?

You’d really have to try it out. Mobile platforms also range considerably in their performance characteristics so you’d have to define a “minimum” device and make sure it runs smoothly there.

So at least for the autocomplete/suggestion functionality there is a well known data structure that makes this kind of query very fast called a “trie”, or “prefix tree”: https://en.wikipedia.org/wiki/Trie.

Honestly your system seems like it might benefit from the use of a relational database, given the complexity of the results you’re looking for, even though your data size is not large.

1 Like

Indeed. I’m aiming at some low- to-mid end devices.

Do Tries differ from the “contains” or “starts” methods that are available for strings?

I see. How could I get started on a relational database?

Two completely different things, yes. Those methods on a string examine a single string to find out details about it. A trie is a way of storing many strings such that given a partial prefix string, you can very efficiently get a list of strings that start with that prefix, without looking at every single string.

SqlLite is a lightweight database that is commonly used as an embedded piece of other software. Google around for some tutorials on “sqlLite in Unity”.

1 Like

I see!
Is there any unity type for storing data in tries, or do I have to write my own?

Will do. Thanks!

EDIT: After a quick search, it appears to me that a relational database is exactly what I need. Would you say that SQLite is the way to go for the situation described?

EDIT 2: Have you ever seen this plugin? github, medium
Is it trustworthy?

EDIT 3: I’m thinking that it may be simpler and more worthwhile for me to learn about ECS and multithreading and use the Jobs system to run parallel searches on the collection with a simple Dictionary<string, Employee>. My reasoning is that most likely this will be good enough in performance for the search, and multithreading will surely have more uses for me in the future (and probably even inside the same project) than using a SQL database.
Do you think this makes sense?

So, I’ve created some code to run tests.

It’s meant to filter through Dictionaries of any given type.

This is the extension method:

    public static Dictionary<TKey, TValue> Filter<TKey, TValue>(this Dictionary<TKey, TValue> dictionary, Func<KeyValuePair<TKey, TValue>, bool> condition) {

        Stack<KeyValuePair<TKey, TValue>> stack = new Stack<KeyValuePair<TKey, TValue>>();
        Dictionary<TKey, TValue> filteredDictionary = new Dictionary<TKey, TValue>();

        foreach (var entry in dictionary) {
            stack.Push(entry);
        }

        while (stack.Count > 0) {
            KeyValuePair<TKey, TValue> entry = stack.Pop();

            if (condition(entry)) {
                filteredDictionary.Add(entry.Key, entry.Value);
                dictionary.Remove(entry.Key);
            }
        }

        return filteredDictionary;
    }

Then the code to generate a random list of employees:

    void CreateListOfRandomEmployees(int numberOfEmployees) {
        Stack<string> nameStack = new Stack<string>();
        foreach (string name in randomNames) {
            nameStack.Push(name);
        }



        for (int i = 0; i < numberOfEmployees; i++) {
            string getName = nameStack.Pop();

            KeyValuePair<string, Employee> newEmployee = new KeyValuePair<string, Employee>(getName, new Employee() {
                name = getName,
                sector = (Sector)UnityEngine.Random.Range(0, 5),
                salary = UnityEngine.Random.Range(1000, 3500),
                isFreelancer = UnityEngine.Random.value > 0.95f
            });

            listOfEmployees.Add(newEmployee.Key, newEmployee.Value);
        }

    }

Where the randomNames is just a string array with 1000 random names. I wasn’t worried about performance here since those “employees” are just to test.

Then, the test itself:

    private void Update() {

        int numberOfTests = 10;


        string input = "B";
        Sector sector = (Sector)UnityEngine.Random.Range(0, 5);


        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
        sw.Start();

        for (int i = 0; i < numberOfTests; i++) {


            Dictionary<string, Employee> filteredList = new Dictionary<string, Employee>();
            Dictionary<string, Employee> topPriority = new Dictionary<string, Employee>();
            Dictionary<string, Employee> midPriority = new Dictionary<string, Employee>();
            Dictionary<string, Employee> lowPriority = new Dictionary<string, Employee>();
            Dictionary<string, Employee> midFilterList = new Dictionary<string, Employee>();

            foreach (var entry in listOfEmployees) {
                midFilterList.Add(entry.Key, entry.Value);
            }

            topPriority = midFilterList.Filter(e => e.Value.sector == sector && e.Value.name.StartsWith(input));
            midPriority = midFilterList.Filter(e => e.Value.name.StartsWith(input));
            lowPriority = midFilterList.Filter(e => e.Value.sector == sector && e.Value.name.Contains(input));

            foreach (var entry in topPriority) {
                filteredList.Add(entry.Key, entry.Value);
            }

            foreach (var entry in midPriority) {
                filteredList.Add(entry.Key, entry.Value);
            }

            foreach (var entry in lowPriority) {
                filteredList.Add(entry.Key, entry.Value);
            }



        }

        sw.Stop();
        Debug.Log($"Elapsed: {sw.ElapsedMilliseconds} ms");
    }

I’m getting between 14 ms and 19 ms running with the Editor on.

Since this represents 10 searches (although probably not truly representative since it’s using the same input string and sector for each of those 10 iterations), I’d guess it’s probably fast enough for mobile platforms since they’ll run the search only once, not 10 times.
Maybe I don’t need to implement tries and can just make do with “contains”. Those “names” include surnames, so there shouldn’t be any string much longer than those present in the test code.
Any thoughts?

In the actual app, I plan to use a Stack of Dictionaries to be my “filteredList” and the “midFilter” will get its values from there. So, if the user adds another character to the input, the previous “filteredList” goes to the top of the stack, and a new one is created with the filters. Then, if the user decides to remove the character that was just added, the previous list will simply be popped from the stack, instead of running the filters again.

EDIT: adding a “ToLower()” everywhere to ignore casing doubles the processing time. Any way around that?