Trustworthy Records Retention - Database Security: Applications and Trends

Databases Reference

In-Depth Information

formation about its presence in the original document. More formally, let S

be a reconstruction of the set of words in d , generated by the adversary by

scanning directories, indexes, and migration logs. We say that d 's deletion is

strongly secure iff

∀

w.P ( w

∈

d

|

w

∈

S )= P ( w

∈

d )

∀

w.P ( w

∈

d

|

w/

∈

S )= P ( w

∈

d ) ,

where P ( w

∈

d ) denotes the probability of the word w belonging to document

d , while P ( w

∈

d

|

w

∈

S ) denotes the probability of w being in D given that

w is in S .

Unfortunately, the trustworthy indexing schemes discussed earlier do not

support strongly secure deletion. Generalized hash trees offer weakly secure

deletion, in which Mallory cannot prove that a deleted record contained a

specific set of terms [21]. (We can also define probabilistically secure dele-

tion, in which the probability of the record having a specific reconstruction

is bounded above.) Trustworthy inverted indexes and jump indexes are even

more problematic with respect to deletion.

To see why deletion is dicult, consider an email from Alice to Bob with

the text Please sell 10,000 shares today. An inverted index will contain entries

for the terms in the email, and may also note the position of the words in the

record. If the record containing the email is later deleted, it may be possible

to exactly recreate the email by looking at its index entries. Therefore, the

index entries must also be removed to ensure non-reproducibility of deleted

records. WORM devices do not support erasure of short byte sequences, and

they are unlikely to do so in the future.

Even if the WORM device does allow the corresponding index entries to

be erased, structural properties of the index may allow an adversary to infer

that they existed. For example, if the order of keyword insertion is significant

in determining the current structure of the index (as is true for trustworthy

inverted indexes and jump indexes), then the positions of the erased entries in

the index may allow one to infer that a particular erased document contained

certain keywords [21]. For example, one might be able to infer that Alice sent

an email about shares to Bob on a certain day, without knowing the exact

order of words in the email.

To address this problem, one might imagine dividing expiration times into

epochs, and keeping a separate set of indexes for records expiring in each

epoch. Then one could delete the entire epoch of indexes once the epoch is

over. Unfortunately, litigation holds may require a document to be retained

even after its mandatory retention period is over, making it impractical to

delete large batches of records and index entries based simply on their expi-

ration date. In general, there is a tradeoff between the ease and eciency of

the deletion approach and the trustworthiness guarantees of the approach to

implementing litigation holds.

Another option is to rebuild the index in a trustworthy manner when

records are deleted. However, the record arrival rate of today will be the

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home