Database Reference
In-Depth Information
are space ecient and also enable very fast sequential processing of
list elements. On the other hand, inserting or deleting elements in the
token lists becomes expensive.
In situations where fast updates are important, token lists can be
represented by linked lists or binary trees in main memory, or B-trees in
external memory. The drawback is the increased storage cost both for
main memory and external memory structures, and also the fact that
external memory data structures do not support sequential accesses
over the complete token lists as eciently as flat files. On the other
hand, the advantage is that insertions and deletions can be performed
very eciently.
A subtle but very important point regarding updates arises when
considering token lists that contain L p -norms. Recall that L p -norms are
computed based on token weights. For certain token weighing schemes,
like idf weights, token weights depend on the total number of strings
overall and the total number of strings containing the given token.
As strings get inserted, deleted, or updated, both the total number
of strings and the number of strings containing a given token might
change, and as a result the weight of tokens can change as well as the
L p -norms of strings that are not directly related with a given insertion,
deletion, or modification.
Consider for example idf weights, where the weight of a token is
a function of the total number of strings
(see Definition 2.11). A
single string insertion or deletion will affect the idf s of all tokens, and
hence render the L p -norms of all entries in the inverted lists in need of
updating. For example, if we insert a string that contains token λ , the
idf weight W ( λ ) with change, and as a consequence, the L p -norm of
all other strings already appearing in the index that happen to contain
λ will change. Hence, a single string update can result in a cascading
update effect on the token lists of the inverted index that could be very
expensive.
To alleviate the cost of updates in such scenarios a technique based
on delayed propagation of updates has been proposed. The idea is to
keep stale information in the inverted lists as long as certain guarantees
on the answers returned can be provided. Once the computed L p -norms
|
S
|
Search WWH ::




Custom Search