Hippocratic Databases: Current Capabilities and Future Trends - Database Security: Applications and Trends

Databases Reference

In-Depth Information

copy of the sensitive data that is the target of leakage, referred to as table S,

to compute a ranking of queries in the query log over the database suspected

of being the source of leakage. We refer to the output of each query in the

query log as query table Qi. The ranking system ranks each Qi with respect

to its similarity to S.

To perform the ranking, we use one of three measures of proximity be-

tween the sensitive table and the query table. The first, based on informa-

tion retrieval's document frequency similarity measure, computes proximity

by considering partial tuple matches between two tables, while factoring in

the likelihood of the match. Partial tuples are formed by matching a tuple

in Qi with a tuple in S and forming a partial tuple based upon matching at-

tribute pairs. Document frequency is computed as the number of query tables

having the partial tuple. The ranking method favors Qi's having partial tuples

with low document frequency. This method has high sensitivity to even single

partial tuple matches occurring in few query results.

The second, based on record linkage and expectation maximization, mea-

sures proximity by aggregating scores of individual tuple matches. This

method was originally devised to match individuals appearing in different

census records in the absence of surrogate keys uniquely identifying these in-

dividuals in each list such as social security number. The method is adaptive

and estimates the number of record pairs that are true matches and non-

matches based upon attributes that are common to both lists and the likeli-

hood of a random match. It is applied to our information leakage problem by

matching each query table Qi with S. This method has lower sensitivity than

the previous one to unique combinations of matching attributes appearing in

few Qi's.

The third is based on the minimum description length principle and mea-

sures proximity by computing the minimum length (maximum probability)

derivation of the sensitive table from the query result. In contrast with the pre-

vious two methods, this method also examines the similarity of tuples within

S when measuring the similarity of tuples in Qi with tuples in S. This method

is therefore better at differentiating Qi's having multiple matches with near

duplicates in S with Qi's having multiple matches with distinct information

in S.

The following scenario describes a practical application of query ranking.

Suppose that a major credit card company sells copies of its customer database

to its business aliates under a very strict confidentiality agreement. The af-

filiates are allowed to analyze the database for their own business intelligence

purposes, but may not disclose any of the customer information to a third

party. Several months later, a copy of this database is released into the public

domain. The credit card company would like to determine the source of the

disclosure so that it may terminate its business relationship with this aliate.

Using the HDB query ranking system, the company could evaluate the prox-

imity of the leaked database to the queries that accessed the database by using

the appropriate proximity measure algorithm, and rank the relative likelihood

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home