I am currectly parsing 3D geometry data at runtime and it’s actually working really well expect for one thing.
When I import large meshes, the loading time is huge because of the vertex merging I do.
I have a face class which stores information about the current face I am parsing.
When creating that face I am checking if any of the vertices used for that face is equals to the vertices I have already in my mesh. If thats the case I will check if the uv and normal is also the same and if it is I will use this vertex instead of a new one.
But checking the whole vertices array for thousands of vertices takes super long (~4 minutes compared to ~7 seconds whithout vertices merging)
This approach is super slow, there has to be a faster way, but I have no idea.
You should use a Dictionary for this purpose. I’ve done the same thing in several projects; it works. The only caveat is that, in principle, you could get two Vector3’s which are equivalent and yet don’t hash to the some thing, and so fail to match up. But in practice, all your Vector3’s here come from a file and go through the same parsing steps, so it is almost certainly not going to be a problem.
public int Merge(Vector3 vertex, Vector2 uv, Vector3 normal) {
if (vertices.Contains(vertex)) {
for (int i = 0; i < vertices.Count; i++) {
if (vertices[i].Equals(vertex)) {
if (uvs[i].Equals(uv) && normals[i].Equals(normal)) {
return i;
}
else {
continue;
}
}
else {
continue;
}
}
return -1;
}
else {
return -1;
}
}
You’re doing a double search in this code, you first check if it contains, then you loop through.
Thing is, a contains function on what appears to be either an array/list loops through each entry and compares. You’re effectively doing this twice by saying “if loop through and test equality finds result, then loop through and test for equality”.
This also appears to only be for testing a single vertex, I’m assuming there’s other code for testing ALL vertices, which can probably be optimized as well.
Furthermore, the name of this function is weird… this function doesn’t merge at all. It merely finds a vertex that matches the supplied vertex.
So lets just shorten up this bad boy here a little bit:
public int FindMatchingVertex(Vector3 vertex, Vector2 uv, Vector3 normal)
{
for(int i = 0; i < vertices.Count; i++)
{
if(vertices[i].Equals(vertext) && uvs[i].Equals(uv) && normals[i].Equals(normal))
return i;
}
return -1;
}
Of course, this could be probably sped up much much more with hash table for lookup (a dictionary could do this like JoeStrout suggested), but I don’t know your entire source code for merging, so I can’t optmize it well without seeing what exactly you need.
My posted code or your improved code make almost no difference. Both were super slow for me.
But using dictionaries worked really well (4 seconds loading time comapred to 5 minutes).
But where is the exact difference between dictionaries and lists?
In both approaches I searched the list/dictionary for a certain object. But the time they need is so different.
Yeah, I didn’t expect it to be that much faster. Because the entire approach was naive. I merely cleaned up what was otherwise sloppy code.
lists are an ordered sequence of elements.
Dictionary and HashSet use hashed sets. Sets are an un-ordered collection.
Basically imagine you have an array of length N, where N is a prime number near the size of your collection. In the case of a dictionary, the ‘key’ is what gets hashed (in my second example code above it’s the Vertex struct). That has ‘GetHashCode’ called on it (a method all objects have in .net) or GetHashCode of an IEqualityComparer supplied to the collection (for custom hashing).
This hash value is just an integer that is associated with that object. It doesn’t have to be unique, but it does have to always be returned for that object. If I have a string “ABC” and its hashcode happens to be 0xFF1299, than it should always be that (on that system).
Now you take that hash value and you modulo it around N. This gives you your index within the array of length N.
If 2 objects collide then you put them in what is called a ‘bucket’. Basically its a short linklist of items within that index of the array. Thing is, there should be very few collisions of hashcodes (it’s why N should be prime).
As a result, lookups are super fast. You don’t have to traverse the entire collection trying to find it. Instead you generate the index it is at based on its hashcode, and at worst you get a linklist of like 3 entries to have to compare against. As opposed to the thousands you would have if you traversed the entire collection.
Note, in my example code above, I used Dictionary where the vertex was a key, and the value was the index in the original collection.
BUT, if that index isn’t needed, and you’re just shoving garbage in the ‘value’ part of the dictionary. You could use a ‘HashSet’ instead:
It’s basically the ‘key’ part of a dictionary, without the ‘value’ part. It takes up less memory that way.
This can be perceived as a bag… it’s unordered, and just contains stuff. You can easily put stuff in the bag, confirm if the bag contains anything, but when you go to loop over the stuff in the bag… you’re just grabbing stuff out of the bag one at a time. You don’t know what order it’ll come out. This is great when the order doesn’t matter, you just need to hold stuff.
Here you can find a full breakdown of hash tables (a dictionary):
Honestly, the best way to think about it is like a filing cabinet.
When you have a filing cabinet, you sort by some hash so as to make finding files in it faster. The simplest method might be grouping by ‘first letter of surname’. So if you need to look up ‘John Smith’, you’d go to the ‘S’ drawer. Now you don’t have to search the ENTIRE filing cabinet, only the ‘S’ drawer.
The more specific your hash, the shorter the search. If you grouped by first 3 characters of last name, you’d go to your ‘Smi’ drawer. But of course, this more specific would only work as your filing cabinet got larger. Having a single drawer for ‘Smi’ when you have only 15 files is ridiculous. But when you had 1 million files, it makes way more sense.
The only difference between this and the hash in .net/mono, is that there we turn it into a integer rather than some specialized string/word… because computers prefer numbers to words, where as humans prefer words to numbers. But that’s fine, like integers are simple, their hash is themselves, where as a string might be a XOR shift of all of their characters in Ascii. A short simple algorithm to calculate a number… just like finding the human hash of ‘John Smith’ is fast… we can look at the value and determine the hash lickity split.