Maybe don't use Blake3 on Short Inputs

I needed to hash some relatively short data (the length of an English word) and couldn't really find any benchmarks that fit my purpose. So, I ran them myself, and some of the results were actually kind of surprising to me.

15 January 2024

Motivation

I have needs. Very specific needs. Namely, I want:

  1. To hash a corpus of English words, individually
  2. …using a cryptographic hash
  3. …quickly
  4. …in Python.

Putting aside that this is a strange thing to want to do, I’ve compiled a few benchmarks to help me chose the best function for my purposes. I’ve found plenty of benchmarks for all these hash functions, but they’re almost always for very large inputs.

These benchmarks might be useful to you if you are in the same (unlikely) situation as I, but also, if you’re interested in knowing how hash functions perform on data smaller than a kilobyteOn my particular PC..

The Belligerents

These are the cryptographic hash functions we’ll be benchmarking:

Hash Function Implementation Digest Size Construction Release Year
Blake2B hashlib ≤64 bytes HAIFA 2012
Blake2s hashlib ≤32 bytes HAIFA 2012
Blake3 blake3-py 32 bytes, extensible HAIFA, Merkle tree 2020
MD5 hashlib 16 bytes Merkle–Damgård 1992
SHA-1 hashlib 20 bytes Merkle–Damgård 1995
SHA-256 hashlib 32 bytes Merkle–Damgård 2001
SHA-3 hashlib arbitrary (here, 32 bytes) Sponge 2016
Skein PySkein arbitrary (here, 32 bytes) Series of Unique Block Iterations using the Threefish cipher. 2016

I also had a passing curiosity on how non-cryptographic hashes might compare. These are the ones I compared:

Hash Function Implementation Digest Size Construction Release Year
Metrohash metrohash 8 or 16 bytes Algorithmically generated. Vaguely explained. 2015
MurMurHash3 mmh3 8 or 16 bytes (here, 16) product/rotation 2012
xxHash xxhash 8, 16, or 32 bytes (here, 16 and 32) product/rotation 2014?

As much as possible, I tried to use the built-in hashlib library. The blake3 library is a thin wrapper over the official Rust implementation of Blake3, and beside being the only Python implementation I can find, much of PySkein is implemented in C and is optimized for performance.

The Word Benchmark

Our first benchmark is the “Word” benchmark, wherein individual words from three different corpora are hashed using individual hash functions.

The different corpora are:

Filename Description Link
odessey.txt Homer’s Odyssey as found on Project Gutenberg. Download
wikinews.txt A dataset of 365 English WikiNews articles from 2004 to 2014. The dataset is truncated to 1Mb. Download
sci.crypt.txt All messages from the sci.crypt newsgroup in the 20 Newsgroups dataset. The dataset is truncated to 1Mb. Download

I chose the different datasets because they were created very differently, at different times, for different purposes. And because they were easy to find. You can see below that they differ quite a bit with respect to the length and number of words in each file. The Odyssey is a shorter text with smaller words. sci.crypt has an unsual amount of 1 and 2 letter words, which is probably emoticons and symbols in the headers of the messages. In all cases, ostensibly all words are under 20 characters long. That means that 80 bytes is as about as large as the input to the hash function will beAll input is UTF-8 encoded..

A histogram of word lengths in all three corpora. A bar plot of the number of words by corpus.

Figure 1. Properties of the different corpora. The first panel shows the histogram of word lengths, while the second shows the number of words in each corpus.

What we see in the results below is that, as expected, the Odyssey corpus have consistantly smaller hash times than the other two. This is followed by WikiNews and sci.crypt.

Among the cryptographic hash functions, the Blake2 family takes the cake, with Blake2s out-performing all but one of the non-cryptographic functions.

Among the non-cryptographic hash functions, ho-boy is MurMurHash3 fast. That said, all the hash functions here are actually very fast in absolute terms (as far as I’m concerned).

Blake3 and SHA-3’s dead-last performance is a bit surprising to me. Blake3 especially has a reputation for being the fastest gun in the west, and by quite some margin in the benchmarks I’ve seen. What’s more is that its little brothers do so well, which is a bit weird considering they share so much DNA. But I guess I shouldn’t be too surprised, it’s well known that hashes perform differently on short inputs (hence this benchmark).

A dot plot showing hash times for several cryptographic hash functions on multiple corpora. A dot plot showing hash times for several non-cryptographic hash functions on multiple corpora.

Figure 2. Each dot represents the time it takes to individually hash all the words in the corpus indicated by the colour. One hundred replicates for each corpus are computed and plotted here.

The Random Benchmark

I wanted to see how hash performance changed with input size, so I began by hashing random input generated by Python’s built-in random.randbytes functionThis uses the Mersenne Twister, a non-cryptographic pseudo-random number generator. in the interval of zero bytes to one gigabyte:

A chart showing cryptographic hash times as a function of the size of input data from zero to one gigabyte. A chart showing non-cryptographic hash times as a function of the size of input data from zero to one gigabyte.

Figure 3. The performance of different hash functions as a function of input size over the range of one gigabyte. Shown here is the average of 10 replicates.

Well… uh…, we can’t really make out any input-size-dependent differences in relative hash performance. It’s still useful however. Blake3 lives up to its reputation, being nearly five-fold faster. I was a bit surprised to see SHA-1 and SHA-256 performing so closely, because I feel like I’ve often heard people belly-ache about SHA-256 performance when having to migrate from SHA-1 for security reasons. Also, SHA-1 is a lot faster than these benchmarks from the Blake2 team. Then I realized that my CPU supports the SHA instruction set, which accelerates both SHA-1 and SHA-256 in hardware.

SHA-3 has caught flack for being very slow in software, and we definitely see that here. I can’t wait for the apparently “blazing fast” hardware implementations, because we see here what wonders they did for SHA-256. For my purposes, however, it’s not useful.

So anywho, can we see those input-size-dependent performance changes between 0 and 1 kilobytes?

Figure 4. The performance of different hash functions as a function of input size over the range of one kilobyte. Shown here is the average of 10 replicates.

Aha! Another story emerges at this resolution. We can see that Blake2s loses its lead when inputs start getting larger than 100 bytes. SHA-1 and Blake2b are pretty similar up until around 500 bytes when Blake2b begins getting slower.

Both accelerated SHA-1 and SHA-256 are pretty consistant, independent of input size.

SHA-3 has this weird step-function thing going on, where steps are about… hmm… that looks susipiciously like it could be 136 bytes, which is the block size of the 256-bit variant of SHA-3 that we’re using.

Blake3 is still slow at even 1kB, which is enough to encode at least one UTF-8-encoded tweet.

The Battlefield

All the benchmarks were run on my PC with the following specs:

CPU RAM
AMD Ryzen Threadripper 1950X 32GiB

Conclusion

For my weird “hashing English words” use case, Blake2s is the hash function to use based on these numbers. If I relax my needs to include non-cryptographic functions, then MurMurHash3 is probably what I’d go forUn-keyed hashes like MurMurHash3 are susceptible to DoS attacks, so you’d want to use something like SipHash or a fast cryptographic hash depending on your use case..

If I had more data (megabytes or gigabytes) to hash, I’d likely use Blake3. But Blake3’s performance on inputs as big as 1kB is beat out by half of the cryptographic hash functions tested.

That said:

  1. These functions are all very fast in absolute terms.
  2. These benchmarks might look very different if you run them.

The code I used to run the benchmarks is here, but be forewarned, it was written in a fugue state.

Honourable Mentions

There are a lot of fine hash functions, particularly non-cryptographic hash functions, that were not included in this analysis for various reasons:

KangarooTwelve is basically Keccak (which is basically SHA-3), but with many fewer rounds. It’s meant to be fast, but when I tested it using the pycrpto library, it was orders of magnitude slower than SHA-3. I’m going to chalk this up to an implementation error on my or pycrpto’s part here, since the results really made no sense.

AquaHash is a non-cryptographic hash by the author of Metrohash. It is specifically designed deliver “state-of-the-art performance across all key sizes” by employing two algorithms, one that kicks in for smaller inputs, and the other for larger ones. This would be great to see in this analysis but I couldn’t find any Python bindings for the reference C code.

Haraka v2 is something that comes up a lot when I search for “short-input hash”. It’s cryptographic and uses AES instructions to accelerate its compuation. There looks to be a reference C and Python implementation online, but I didn’t realize that until after I ran the analysis. Would be very neat to check out.

Meowhash is the Hacker News darling by the Handmade Hero himself and his company Molly Rocket. It’s a non-cryptographic hash function which is accelerated by the AES CPU instruction. It’s meant to be really fast, but I couldn’t find a Python implementation/binding, so it’s not included.

Sharing is Caring

Comments

Leave a comment as a reply to this Mastodon post and it'll show up right here.