Databases Reference
In-Depth Information
trustworthy. To be trustworthy, we can append the terms of newly arriving
records to the appropriate posting lists at the time the records arrive. With-
out additional optimizations, however, this approach is too slow to support
real-time insertion of typical business documents: each new posting list entry
requires a file append operation, which in turn requires a random I/O. The
performance can be improved vastly by merging the posting lists for different
terms until the tails of all posting lists fit into the storage server's cache (see
Figure 3(b)). For example, with a cache of 256 MB (which is less than one
would find in today's storage servers), one can index 500 new 500-keyword
documents per second. Compared to a traditional unmerged implementation
of an inverted index, query workload performance drops by less than 10%,
which is quite good. Intuitively, merging posting lists has little effect on query
performance because only a small set of terms is widely used in queries. As
long as these “popular” terms are not merged together, performance is little
affected by merging.
23
23
25
713
31
713
31
32
33
(a) B+ Tree in WORM Storage
2
4
7 11
13 19
23 29
31
33
(b) B+ Tree Manipulated to Hide
Entries
4
7 11
13 19
23 29
31
25 26
32
2
Fig. 4. Why B+ trees are untrustworthy. (a) B+ tree in WORM storage. New
elements are added at the leaf level. When a leaf node fills up, a new leaf is created
and an entry is added to the parent that points to the new leaf. (b) The adversary
can add entry 25 to the root and point it to a spurious subtree. This effectively hides
entry 31 from subsequent queries.
For trustworthy conjunctive keyword search, the auxiliary B+ trees used
in zigzag joins must also be trustworthy. One can create a B+ tree for an
increasing sequence of document IDs without any node splits or merges, by
building the tree from the bottom up, as shown in Figure 4 for the special case
of a 2-3 tree. Unfortunately, such an index structure is also not trustworthy,
even when kept on WORM storage, because the path to each entry is not
immutable. For instance, Figure 4(b) shows that the adversary can hide entry
31 by creating a separate subtree that does not contain 31, and adding an entry
25 at the root to lead to the new subtree. Other techniques like binary search
can also be compromised by the adversary, by appending smaller numbers at
the tail of the sequence. For example, binary search on the leaves of the tree
in Figure 4 would miss 31 because of the malicious entry 30 at the end.
Search WWH ::




Custom Search