A C++ implementation of a fast hash map and hash set using hopscotch hashing

(github.com)

88 points by gjvc10 hours ago

4 comments

nly1 hour ago
My goto these days (and afaik the state of the art) is boost::unordered_flat_set paired with rapidhash for hashing (since the GNU std::hash functions based on murmurhash are ridiculously slow)The cacheline performance is pretty hard to beat (SIMD optimised linear scan before hopping), which is where all the wins come in the real world.But basically any of the faster hash maps from absl, boost or folly are going to wreck the standard library in terms of perf
jll296 hours ago
google::dense_hash_map is faster than this new implementation according to their benchmark's diagram (google::dense_hash_map has the lowest runtime of all tested methods).
mgaunard9 hours ago
How does it compare to boost unordered flat map?Looks like the benchmarks were last updated in 2019.
- compiler-guy8 hours ago
 <a href="https://tessil.github.io/2016/08/29/benchmark-hopscotch-map.html" rel="nofollow">https://tessil.github.io/2016/08/29/benchmark-hopscotch-map....</a>Has some older benchmarks, including those two.
 - mgaunard5 hours ago
 boost unordered flat map didn't exist in 2016 (nor 2019).
 - jeffbee8 hours ago
 A more recent benchmark is <a href="https://martin.ankerl.com/2022/08/27/hashmap-bench-01/" rel="nofollow">https://martin.ankerl.com/2022/08/27/hashmap-bench-01/</a>However, it lacks the newer Boost stuff which is very fast.The Hopscotch map was interesting at the time but due to unfortunate timing was immediately outshone by absl::unordered_flat_map A.K.A. "Swiss tables", and there's been even more water under the bridge since then.
 - RossBencina7 hours ago
 Abseil Swiss Tables carefully avoids intermediate allocations/copy constructor calls.[1] I'd be wary about inferring underlying algorithm performance from benchmarks that don't explicitly control for these optimisations. (Or maybe everyone is using them and I'm out of touch.)[1] <a href="https://abseil.io/about/design/swisstables" rel="nofollow">https://abseil.io/about/design/swisstables</a>
 - jeffbee6 hours ago
 Algorithmically hopscotch has a better strict worst case whereas swiss tables have a degenerate O(N) lookup. But there are a lot of maps like that. robin_hood::flat_hash_map is very fast but I can create insert sequences under which it will call std::abort, which I feel is ridiculous. But if your hash map isn't exposed to hostile inputs then you might not be concerned.
 - utopcell2 hours ago
 You probably mean absl::flat_hash_map<>.
 - quadrature8 hours ago
 Is there something better than Swiss tables ?.
 - reinitctxoffset7 hours ago
 On modern super wide znver5 or SBSA with full-clock scalar 256 or 512 ALUs / SIMD lanes deep pipelines hight BTB pressure eyc. it's just really difficult to make a priori statements about performance for a given workload.absl::flat_hash_map (or folly::F14) are great defaults if you can eat the invalidation semantics.But if it's really hot you measure by workload and have infrastructure to flag the right ones in.This seems promising. I'll start benching it alongside the other likely lads.
 - szmarczak7 hours ago
 No. Fundamentally it's not possible to be faster.
 infamouscow7 hours ago
 This is not true. It is fast as a general purpose hash table, but claiming it's the fastest across all datasets and workloads is silly.
teo_zero3 hours ago
The concept is very similar to robin hood. In fact most of the performance charts show that the curves of hopscotch and robin hood are very close. I think I'd prefer robin hood as it's well known.