Best algorithm for a sorted list, which is fast to iterate through and find the nearest number

Use Case:
This is a bit more of a technical question, so bear with me as I fail to properly explain this. My use case is an Android styled RecyclerView that I’m attempting to make seamless. The values I’m storing are the heights of each item in the list which, as explained below, are not actually all stored.

Problem:
I am storing a massive amount of data, upwards to 10,000,000 elements. Now most of this data is actually the same, so I simplified it by defining a default and only keeping track of what deviates from that default. So an index can range from 0 to 10,000,000, and at each index is a value. Now say you are at index ‘i’, and you want to move to index ‘j’, it needs to iterate through every value stored between ‘i’ and ‘j’. Because we know our default value, which is actually only stored once, we only need to iterate through what’s been changed, and the rest can be quickly calculated. To do that though, we are going to need to find the next stored value from index ‘i’. For example, we may be starting at index 500 but the next changed value may not be until index 551, so we need a way to find where in our list of values the next item is located. From there, we can simply iterate through the list until we have passed ‘j’. Values are looked up extremely frequently by index, and moving from index ‘i’ to ‘j’ is also a common operation. Insertion, removal, and modification of these values happen less frequently, but still need to occur in real time.

Current Solution:
So far I’ve been using a Dictionary to store the values, using the index as the key. This works great since it’s fast to insert and retrieve, but it’s not great for iterating through. I could just get the list of keys and sort them, but the “move from ‘i’ to ‘j’” operation happens too frequently for me to justify the cost.

Thoughts:
Currently I’m thinking of using another data structure alongside the dictionary. I really need that O(1) when looking up an item by key, as it happens so frequently. It seems a bit dirty to be storing the indexes twice though. I’ve looked into SortedList and SortedDictionary, and I don’t think they’re a solid fit since they have a heavier lookup time. As for finding the nearest index, a BinarySearch on a list provides the next nearest element in O(logn) time, though if I used a standard list then adding a new value would take O(nlogn) time, since I would need to do a BinarySearch and Insert on the list.

Wait… what are you trying to do?

Do you need a sorted list of entries, where you sort on height? But the height can change, so it might need to move around in the list and remain sorted?

I’m not sure what this index is for… why are you using an index as a Dictionary key? Why not just… have an array, the index is your key when you’re in an array.

Or is it you have 2 collections, one is a list of objects, the other a Dictionary. And you’re using the index from the first as the key for the second?

Instead of what you’ve done… could you describe what ends you’re attempting to accomplish?

You are trying to brute force something where a more indirect method is called for. Such as using an actual database with proper indexing and doing the queries via http. Or even using an embedded database. Or even simpler via rethinking the UI.

I knew I would botch this explanation up. The index determines what item to apply the height to, and it’s sorted by index. So say we have 100,000 items in our list, with a default size of 50. When you scroll through the list, it needs to check what the height of upcoming items are. Hence where the index comes in, it tells us what item that height goes with. Of course items that don’t have a height specified just use the default height, so we don’t even store a value in the dictionary for them. In fact here is the function I use to get the height of an item:

public float GetItemHeight(int index)
{
    return itemHeights.ContainsKey(index) ? itemHeights[index] : defaultItemHeight;
}

A RecyclerView is like a ScrollRect, but it’s designed to hold an enormous list of items, virtually infinite. It does this by virtualizing the data for each element, and recycling the objects used to display them, so it isn’t constantly creating and destroying them. So when I say I’m scrolling down by 1,000, it checks if the new position for each element would be out of the viewport, and if so, it pushes them back into the object pool. Then it has to determine where to place the elements you can see, and in fact, what the index of those elements even are. Since it’s only keeping track of the position and index of items currently in view, it has to calculate the position of the new items by finding their position and index in reference to the old ones. To do this the only thing it really needs are the heights of each item.

For the sake of example I’m going to assume we are using a dictionary to store the heights. So say we are currently looking at item 50,000 in our list, and we want to scroll down by an amount of 1,000 (note this is not 1,000 items, this is an actual size, so it’s a consistent scroll). To brute force it we would keep checking the height of every item until we eat up our input of 1,000, and display the items that would be in view. Obviously that’s really costly, so instead we take our current index (50,000), and find the next highest key in our dictionary. To do this let’s turn our dictionary keys into an array, since we may need to iterate through them later. If none is found, we can assume the default height for every element. If one is found, we must also take it into account when positioning the other elements. This means that instead of iterating through every item along the way, we are only iterating through items with their height set to something other than the default.

Does this make more sense?

This is what databases are for. It’s a well solved problem. RecyclerView doesn’t imply you should store huge amounts of data in memory. At what point you should use a database vs a simple memory cache is completely orthogonal to RecyclerView.

You can solve this using low level indexes like btree if you really want, although that’s kind of crazy since there are so many off the shelf solutions here you could just plug in.

2 Likes