Database Reference
In-Depth Information
multi-set of the frequencies of the elements in the set. When the elements of
the multi-set are ordered, the multi-set approximates a histogram.
This chapter starts with a review of the background concepts common to
all the sketch algorithms. In particular, these algorithms make extensive
use of hash functions. These functions are important in many facets of
computer science, but it is their statistical properties that are useful in
sketch applications. This chapter also reviews some aspects of mathematical
sets. Most of the algorithms in this chapter deal with set operations, and
therearesomeimportantpropertiesofsetsthatareexploitedtoimprovethe
performance of the algorithms.
Registers and Hash Functions
All sketches use the same building blocks to implement their particular
approach: registers and hash functions. This section discusses the
properties of these two building blocks, concentrating on hashes.
Registers
Registers are a straightforward concept. They are simply an array of
counters that store the data for the sketch. Registers in most sketch
applications are small relative to the arrays used in the direct
representation. They require only constant space that does not depend on
the number of input elements, typically with smaller ranges than the
original input. This smaller range allows for representing registers with
fewer bits than the original. For example, the basic versions of the Bloom
Filter only sets registers to 1 or 0, so you can use a bit array instead of a byte
array and cut space requirements by eight times.
Hash Functions
Hash functions are functions that take an input and return a value in some
finite space of m output values, regardless of the length of the input. For
example, a hash function can take a string, which could theoretically be
infinitely long, and return, say, a 32-bit integer, giving you 2 32 possible
outputs.
Hash functions are widely used in computing, and for decades people have
researched and developed a variety of hash functions. All hash functions
are deterministic: The same input will always result in the same output.
Search WWH ::




Custom Search