Privacy Preserving Publication: Anonymization Frameworks and Principles - Database Security: Applications and Trends

Databases Reference

In-Depth Information

Definition 3 ([11]). Given a QI-group, use n 1 , n 2 , ..., n m to denote the num-

ber of tuples having the most, 2nd most, ..., the least frequent sensitive values

in the group, respectively. The QI-group obeys recursive ( c, l )-diversity ,if

the next inequality holds 1 :

n 1 ≤

( n l + n l +1 + ... + n m ) ,

(1)

where c is a certain constant, and l is an integer at most m.

For instance, the last QI-group in Figure 2 satisfies recursive (2, 3)-diversity.

Specifically, for that QI-group, n 1 =2,and n 2 = n 3 = 1. Setting c to 2 and l

to 3, Inequality 1 becomes 2

It is not hard to observe an interesting connection between Definitions 2

and 3. A QI-group obeys ( c, l )-diversity, if and only if, after eliminating the

tuples with any l different sensitive values, the remaining set of tuples still

obeys frequency c

≤

c +1 -diversity. This connection leads to a crucial property

of Definition 3: if all the QI-groups in a generalized table satisfy recursive

( c, l )-diversity, an adversary can correctly discover the sensitive value of an

individual with probability at most c/ ( c + 1), provided that the adversary can

preclude at most l

2 values as belonging to the victim individual.

We have discussed three different versions of l -diversity. Machanavajjhala

et al. formulate several other versions, as can be found in [11], which also

explains the computation of generalized tables conforming to this principle.

Currently, we are not aware of any published work on the hardness of finding

the optimal l -diverse table that minimizes a certain information-loss metric.

Nevertheless, it appears that the problem should be NP-hard for most metrics.

In any case, l -diverse generalization also inherits the defect of k -anonymity

that the amount of information loss will be inevitably large, when the number

of QI-attributes is high. This defect can have rather negative influences on the

utility of the published table, as discussed in the next section.

−

4Anatomy

So far our discussion has employed generalization as the underlying

anonymization framework. In this section, we proceed to introduce another

framework: anatomy. As with generalization, anatomy can be combined with

k -anonymity and l -diversity. The following analysis focuses on l -diversity, due

to its obvious advantages over k -anonymity. In particular, we adopt frequency

l -diversity (Definition 2), to avoid the complication of recursive ( c, l )-diversity.

Accordingly, we will assume that probabilistic homogeneity attacks are the ob-

jective of privacy protection. The discussion in this section, however, can be

extended to k -anonymity and other versions of l -diversity in a straightforward

manner.

1 In the original proposition of [11], the “

” in Inequality 1 is “ < ”. We adopt “

”

≤

here to simplify discussion, but the rationale extends naturally to “ < ” as well.

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home