Privacy-Preserving Data Mining: A Survey - Database Security: Applications and Trends

Databases Reference

In-Depth Information

Once a data block is released, it is no longer possible to go back and increase

the level of generalization. On the other hand, new releases may sharpen an

attacker's view of the data and may make the overall data set more suscep-

tible to attack. For example, when different views of the data are released

sequentially, then one may use a join on the two releases [109] in order to

sharpen the ability to distinguish particular records in the data. A technique

discussed in [109] relies on lossy joins in order to cripple an attack based on

global quasi-identifiers. The intuition behind this approach is that if the join

is lossy enough, it will reduce the confidence of the attacker in relating the

release from previous views to the current release. Thus, the inability to link

successive releases is key in preventing further discovery of the identity of

records.

3.4 The l -diversity Method

The k -anonymity is an attractive technique because of the simplicity of the

definition and the numerous algorithms available to perform the anonymiza-

tion. Nevertheless the technique is susceptible to many kinds of attacks espe-

cially when background knowledge is available to the attacker. Some kinds of

such attacks are as follows:

•

Homogeneity Attack: In this attack, all the values for a sensitive at-

tribute within a group of k records are the same. Therefore, even though

the data is k -anonymized, the value of the sensitive attribute for that group

of k records can be predicted exactly.

•

Background Knowledge Attack: In this attack, the adversary can use

an association between one or more quasi-identifier attributes with the

sensitive attribute in order to narrow down possible values of the sensitive

field further. An example given in [77] is one in which background knowl-

edge of low incidence of heart attacks among Japanese could be used to

narrow down information for the sensitive field of what disease a patient

might have.

Clearly, while k -anonymity is effective in preventing identification of a record,

it may not always be effective in preventing inference of the sensitive values

of the attributes of that record. Therefore, the technique of l -diversity was

proposed which not only maintains the minimum group size of k , but also

focuses on maintaining the diversity of the sensitive attributes. Therefore, the

l -diversity model [77] for privacy is defined as follows:

Definition 2. Let a q ∗ -block be a set of tuples such that its non-sensitive val-

ues generalize to q ∗ .Aq ∗ -block is l-diverse if it contains l ”well represented”

values for the sensitive attribute S. A table is l-diverse, if every q ∗ -block in it

is l-diverse.

A number of different instantiations for the l -diversity definition are discussed

in [77]. We note that when there are multiple sensitive attributes, then the

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home