When faster is actually slower

This is a follow-up to my post on code optimization.

Today I did some experimenting with hashes and binary-search-trees, specifically playing with QHash and QMap in C++.

According to the Qt containers algorithmic complexity documentation, QHash provides “significantly faster lookups” than QMap. Your intuition may confirm this, because you’ve heard that QHash uses a constant-time algorithm for lookups, while QMap uses a logarithmic algorithm. However, when it comes to performance questions, intuition is not the best guide.

When considering the QMap vs. QHash speed question, we have to consider everything these classes do when you are looking up an entry. Otherwise, you may be led to assume that QHash lookups are faster than QMap lookups in all cases.

How does QMap work?

Like an associative array or dictionary, the QMap class allows you to store values identified by keys. You can insert key/value pairs into the QMap and retrieve values by key. You can use any type as the value, but the key type must implement the < operator. This is because QMap looks up keys using a binary search through a sorted list.

How does QHash work?

QHash also provides the ability to insert and lookup values like QMap, but its implementation is quite different. Keys used in a QHash must implement the == operator and there must exist a global qHash(key) function for the key type. The QHash class hashes each key on lookup to find the position for each key’s value. This is a constant time operation (amortized). This means that the time it takes to lookup a key does not depend on the number of keys already stored in the QHash.

When is QHash slower than QMap?

Consider the following example:

QHash<QString, int> myHash;
QMap<QString,  int> myMap;

To insert the key “my favorite number” with the value 42 into myHash, the QHash has to first generate a hash code for “my favorite number”. To do so, it uses the qHash() function, which reads every character of the string “my favorite number”. This generates an index that is used to locate the position for this entry. Notice that qHash() must read every character of the string. The QMap, on the other hand, uses the < operator to compare the string “my favorite number” with all other keys in the QMap.

Now let’s suppose we filled myHash and myMap with 1 million short string keys (like a few characters each), and then did a few lookups.

I measured this case using QTime, and found that QHash lookups are about 3 times faster than QMap lookups in this scenario.

This is what I would expect after reading the documentation, but what if the key strings are bigger, like 200 characters each, and all quite distinct from each other?

I measured this case and found that QHash lookups are about 3 times slower than QMap lookups.

How can this be? The documentation says that QHash lookups are supposed to be faster.

The reason is this: Because QMap uses the < operator, it does not need to inspect every character of each key string. Instead, it compares the key string to other key strings, one character at a time, and stops if it discovers a difference before reaching the end of the string. This results in much less computation when locating a key’s value. So much less computation in fact, that even though QMap’s lookup algorithm is logarithmic, it is actually faster than QHash’s constant-time lookup algorithm on the same dataset.

I should offer the disclaimer that QHash will likely be faster than QMap for larger quantities of entries with that key size, but I was unable to test larger quantities due to the limited amount of RAM in my laptop.

Conclusion

When optimizing for speed, always consider all factors. Just because an algorithm is constant time does not mean that it is always faster than a non-constant time algorithm. And always remember to measure execution speed before assuming that one approach is faster than another.

Write a Reply or Comment

Your email address will not be published.