Database Reference
In-Depth Information
combination using the inclusion-exclusion principle: |featureA:valueX|
+ |featureB:valueY| - |featureA:valueX union featureB:valueY|.
So long as the number of different values each feature can take on is
small, this can be done interactively. If the possible values each feature
can take on is very large, the best approach is to use the Count-Min
sketch to identify the most common feature values to control the
number presented in the pivot table user interface.
The Count-Min Sketch
The final sketch algorithm to discuss is the popular Count-Min sketch. This
algorithm essentially approximates a map (also called a multi-set) where
the distinct values act as the keys and the number of times an element
has been seen is the value. The simplest implementations of the Count-Min
sketchsupportapproximatelookupsinthismap,called point queries .When
the distinct values have some sort of intrinsic ordering, the sketch can be
considered an approximation of a histogram (also known as a frequency
table).
With a bit more work, the Count-Min sketch can also support range queries
that approximate the frequency of elements in the range [a,b]. After range
queries are supported, it is also possible to estimate order statistics using a
binary search on the ranges. This is discussed in detail later in this chapter.
Search WWH ::




Custom Search