Database Reference
In-Depth Information
filter results in a non-overlapping partitioning of strings since every
string corresponds to exactly one L p -norm value; on the other hand
the prefix filter results in an overlapping partitioning of strings, since a
single string will end up in the partition of more than one prefix tokens.
Notice that using the filter tree does not affect what types of algo-
rithms are used to merge lists at the leaf level of the filter tree. In
particular, specific optimizations can still be applied at that point to
speed-up list merging. For example, one can still use the heaviest
first algorithm to merge lists by applying the incremental L p -norm
tightening principle (see Lemma 6.15), even though an initial L p -norm
filter has already been applied at the top of the filter tree.
7.4
Index Updates
Similar updating issues with the ones discussed for full token based
inverted indexes hold for signature based inverted indexes. In order to
be able to perform ecient updates we need to represent token lists
either as binary trees or linked lists in main memory, or as B-trees
in external memory. Some signature techniques suffer from problems
similar to those for inverted indexes when token weights get updated.
For example, for prefix signatures a single token weight update can
change the sort order of tokens, and as a consequence, affect the prefix
signature of every other string in the index. In that case, the complete
index needs to be rebuilt to guarantee correctness. Delayed propagation
of updates in this case will inevitably lead to false dismissals. The same
problem occurs with the prefix enumeration signature. The pigeonhole
signature does not depend on token weights and hence can support
updates eciently.
7.5 Discussion and Related Issues
Filtering techniques are the preferred algorithm for evaluating join
queries, due to the small size of the resulting inverted index. The pre-
fix filter is a very general technique that works independently of the
weighing scheme used. The pigeonhole signature is limited to uniform
weights. The prefix enumeration signature can be used for arbitrary
Search WWH ::




Custom Search