Privacy in Database Publishing: A Bayesian Perspective - Database Security: Applications and Trends

Databases Reference

In-Depth Information

2.1 Attacks

In this model we only consider attackers who access the data legally by in-

specting the published data

( D ), using it together with external knowledge

to infer information about the secret

( D ). The defense against unauthorized

access to the database is beyond the scope of this model.

Possible databases. Ideally, the attacker would like to reverse-engineer

D starting from the observed published data

( D ). This would immediately

lead to the full disclosure of the secret: the attacker could compute the secret

by directly running

over D .Ofcourse,

is likely to be a lossy data transfor-

mation, thus precluding the unequivocal identification of its arguments from

its output. In general there may be (potentially infinitely) many databases

which have the same image as D under

. The attacker cannot distinguish

among them solely by observing the published data

( D ), regardless of the

computational resources at his disposal. Therefore, in the absence of exter-

nal knowledge about D , all databases with the same image are possible from

the attacker's point of view (we will shortly introduce the attacker's external

knowledge into the model). We therefore refer to the set [ D ] V of databases as

the possible databases given

( D ):

D |V

( D )=

[ D ] V :=

{

( D )

}

Example 2. Continuing Example 1, assume that the owner publishes a view

listing all the patients V p ( p ):

−

PDA ( p, d, a ) and one listing all ailments

treated by the hospital: V a ( a ):

−

PDA ( p, d, a ) . Assume that on the actual

database D , V p ( D ) yields

Then some of the possible databases corresponding to the observed views are

D 1 =

{

John, Jane

}

and V a ( D ) yields

{

flu, pneumonia

}

{

(John, doc 1 , flu), (Jane, doc 2 , pneumonia)

}

, D 2 =

{

(John, doc 3 , flu),

(John, doc 3 , pneumonia), (Jane, doc 4 , flu)

}

, etc., where doc i are unknown

doctor names.

Clearly the set of possible databases may be very large. For example, con-

sider the case when the published data is a projection of a table. By observing

the published table (and using no external knowledge about the data), an at-

tacker must assume any possible completion for the missing columns. This is

the case in Example 2 if the attacker does not know the set of all possible

doctors.

It is therefore not a priori given that the attacker is even able to enumerate

all possible databases. In the following, we assume the worst-case scenario for

the owner, namely that the attacker comes up with some finite representation

of the set of possible databases which he uses for reasoning about the secret.

Note that the more advantage we assume for the attacker, the stronger any

privacy guarantees based on these assumptions.

Possible secrets. Since the owner cares about guarding only the secret

(rather than the non-sensitive parts of the database), the privacy model fo-

cuses on possible secrets. From a reasonable attacker's point of view, a secret

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home