Databases Reference
In-Depth Information
formation about its presence in the original document. More formally, let S
be a reconstruction of the set of words in d , generated by the adversary by
scanning directories, indexes, and migration logs. We say that d 's deletion is
strongly secure iff
w.P ( w
d
|
w
S )= P ( w
d )
w.P ( w
d
|
w/
S )= P ( w
d ) ,
where P ( w
d ) denotes the probability of the word w belonging to document
d , while P ( w
d
|
w
S ) denotes the probability of w being in D given that
w is in S .
Unfortunately, the trustworthy indexing schemes discussed earlier do not
support strongly secure deletion. Generalized hash trees offer weakly secure
deletion, in which Mallory cannot prove that a deleted record contained a
specific set of terms [21]. (We can also define probabilistically secure dele-
tion, in which the probability of the record having a specific reconstruction
is bounded above.) Trustworthy inverted indexes and jump indexes are even
more problematic with respect to deletion.
To see why deletion is dicult, consider an email from Alice to Bob with
the text Please sell 10,000 shares today. An inverted index will contain entries
for the terms in the email, and may also note the position of the words in the
record. If the record containing the email is later deleted, it may be possible
to exactly recreate the email by looking at its index entries. Therefore, the
index entries must also be removed to ensure non-reproducibility of deleted
records. WORM devices do not support erasure of short byte sequences, and
they are unlikely to do so in the future.
Even if the WORM device does allow the corresponding index entries to
be erased, structural properties of the index may allow an adversary to infer
that they existed. For example, if the order of keyword insertion is significant
in determining the current structure of the index (as is true for trustworthy
inverted indexes and jump indexes), then the positions of the erased entries in
the index may allow one to infer that a particular erased document contained
certain keywords [21]. For example, one might be able to infer that Alice sent
an email about shares to Bob on a certain day, without knowing the exact
order of words in the email.
To address this problem, one might imagine dividing expiration times into
epochs, and keeping a separate set of indexes for records expiring in each
epoch. Then one could delete the entire epoch of indexes once the epoch is
over. Unfortunately, litigation holds may require a document to be retained
even after its mandatory retention period is over, making it impractical to
delete large batches of records and index entries based simply on their expi-
ration date. In general, there is a tradeoff between the ease and eciency of
the deletion approach and the trustworthiness guarantees of the approach to
implementing litigation holds.
Another option is to rebuild the index in a trustworthy manner when
records are deleted. However, the record arrival rate of today will be the
Search WWH ::




Custom Search