Use Case:
This is a bit more of a technical question, so bear with me as I fail to properly explain this. My use case is an Android styled RecyclerView that I’m attempting to make seamless. The values I’m storing are the heights of each item in the list which, as explained below, are not actually all stored.
Problem:
I am storing a massive amount of data, upwards to 10,000,000 elements. Now most of this data is actually the same, so I simplified it by defining a default and only keeping track of what deviates from that default. So an index can range from 0 to 10,000,000, and at each index is a value. Now say you are at index ‘i’, and you want to move to index ‘j’, it needs to iterate through every value stored between ‘i’ and ‘j’. Because we know our default value, which is actually only stored once, we only need to iterate through what’s been changed, and the rest can be quickly calculated. To do that though, we are going to need to find the next stored value from index ‘i’. For example, we may be starting at index 500 but the next changed value may not be until index 551, so we need a way to find where in our list of values the next item is located. From there, we can simply iterate through the list until we have passed ‘j’. Values are looked up extremely frequently by index, and moving from index ‘i’ to ‘j’ is also a common operation. Insertion, removal, and modification of these values happen less frequently, but still need to occur in real time.
Current Solution:
So far I’ve been using a Dictionary to store the values, using the index as the key. This works great since it’s fast to insert and retrieve, but it’s not great for iterating through. I could just get the list of keys and sort them, but the “move from ‘i’ to ‘j’” operation happens too frequently for me to justify the cost.
Thoughts:
Currently I’m thinking of using another data structure alongside the dictionary. I really need that O(1) when looking up an item by key, as it happens so frequently. It seems a bit dirty to be storing the indexes twice though. I’ve looked into SortedList and SortedDictionary, and I don’t think they’re a solid fit since they have a heavier lookup time. As for finding the nearest index, a BinarySearch on a list provides the next nearest element in O(logn) time, though if I used a standard list then adding a new value would take O(nlogn) time, since I would need to do a BinarySearch and Insert on the list.