Database Reference
In-Depth Information
They also try to be uniformly distributed such that random inputs produce a
uniform distribution among the outputs.
For sketch applications, speed is the primary requirement for any hash
function, particularly when used in a streaming setting. This mostly
eliminates cryptographic hash functions such as MD5 and SHA1 from
contention. Although these hashes are very random, they are intentionally
designed to be slow to compute. Despite this fact, cryptographic hashes are
often used in real-world implementations because they are built in to many
programming languages' common libraries.
Rather than use cryptographic hashes, most sketch implementations use a
varietyoffasthashfunctionsthatbettersuittheirperformance needs.These
include the FNV hash, the Jenkins hash, and the MurmurHash. The Fowler,
Noll, and Vo (FNV) hash function was developed in 1991 with variants
supporting output spaces of between 32 and 1,024 bits. It has one of the
simplest implementations of any hash functions, relying on two constant
values chosen from a table for the number of bits to be generated. For
example, the 64-bit Java hash is implemented as follows:
static long fnv64(byte[] input) {
long output = 0xcbf29ce484222325L;
for(var i=0;i<input.length;i++) {
output ^= (long)input[i];
output *= 0x100000001b3L;
}
return output;
}
The FNV hash works quite well on fairly small inputs and is quite fast
on Intel hardware, but the other two hash functions—Jenkins and
Murmur—are often reported to have better real-world performance. The
MurmurHash seems to perform well in real-world applications and has
32-, 64-, and 128-bit variants available. Code for the Murmur hash is not
provided here—it involves large constant arrays that would be meaningless
to reproduce—but the code provided with this topic makes use of the Java
implementation provided by the Colt library. The Murmur code is also
widely available for other languages under a variety of licenses.
Search WWH ::




Custom Search