Trustworthy Records Retention - Database Security: Applications and Trends

Databases Reference

In-Depth Information

A GHT is a balanced tree-based data structure that does not require

periodic rebalancing. In a GHT, predefined hashes of the record key determine

all its possible lookup or insertion locations. The locations where a record can

be inserted or looked up are therefore immutable. To insert or look up a record

in a GHT, the record key is hashed to obtain a position within the root node

(see Figure 2). If the corresponding node position at the root node is empty,

the record is inserted there. If there is a collision, the key is rehashed (using

a different hash function) and an attempt is made to insert the key in the

appropriate subtree of the root node. This process is repeated until an empty

node position is found. If a record cannot be inserted in any existing node of

the tree, a new leaf node is added.

Full-text search (keyword search) is the most convenient way to query un-

structured records such as email bodies and reports. Search engines typically

use inverted indexes for this purpose [9]. As shown in Figure 3(a), an inverted

index comprises a dictionary of terms (i.e., words that appear in documents)

plus a posting list for each term, containing the identifiers of all records con-

taining that term (with additional metadata such as term frequency within

the record, term type, and term position within the record). Queries are an-

swered by scanning the posting lists of the terms in the query. The records

referenced in the posting lists are assigned scores for the query, based on simi-

larity measures (e.g., cosine, Okapi, pivoted, Dirichlet [47, 36]). The scores are

used to rank the records, producing an ordered list of results. Multi-keyword

conjunctive queries can be answered by intersecting the posting lists of the

query terms. To make the intersection fast, an additional index such as a B+

tree is usually kept for each posting list, and a zigzag join is used to perform

the intersection [46].

Query

1

3

11

17

36

Query

Data

Base

Worm

Index

1

3

11

17

36

Data

Base

Worm

Index

3

9

31

3

19

3#Data 3#Base 9#Data 19#Base

31#Data

7

36

7

36

3

(a) Ordinary Inverted Index

(b) Inverted Index After Merging

Fig. 3. Inverted indexes. With each keyword, a posting list ofIDsofdocuments

containing that keyword is stored. Each posting list is stored as a separate file on

WORM. After merging, the keyword (or its hash) must also be stored in the posting

list.

For a trustworthy version of inverted indexes, each posting list can be

stored in a separate append-only file on WORM storage [20]. In a traditional

inverted index, updates are processed in batches that involve sorting all the

entries and regenerating the posting lists in their entirety; of course this is not

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home