Databases Reference
In-Depth Information
Definition 3 ([11]).
Given a QI-group, use n
1
, n
2
, ..., n
m
to denote the num-
ber of tuples having the most, 2nd most, ..., the least frequent sensitive values
in the group, respectively. The QI-group obeys
recursive (
c, l
)-diversity
,if
the next inequality holds
1
:
n
1
≤
c
·
(
n
l
+
n
l
+1
+
...
+
n
m
)
,
(1)
where c is a certain constant, and l is an integer at most m.
For instance, the last QI-group in Figure 2 satisfies recursive (2, 3)-diversity.
Specifically, for that QI-group,
n
1
=2,and
n
2
=
n
3
= 1. Setting
c
to 2 and
l
to 3, Inequality 1 becomes 2
1.
It is not hard to observe an interesting connection between Definitions 2
and 3. A QI-group obeys (
c, l
)-diversity, if and only if, after eliminating the
tuples with any
l
different sensitive values, the remaining set of tuples still
obeys frequency
c
≤
2
·
c
+1
-diversity. This connection leads to a crucial property
of Definition 3: if all the QI-groups in a generalized table satisfy recursive
(
c, l
)-diversity, an adversary can correctly discover the sensitive value of an
individual with probability at most
c/
(
c
+ 1), provided that the adversary can
preclude at most
l
2 values as belonging to the victim individual.
We have discussed three different versions of
l
-diversity. Machanavajjhala
et al. formulate several other versions, as can be found in [11], which also
explains the computation of generalized tables conforming to this principle.
Currently, we are not aware of any published work on the hardness of finding
the optimal
l
-diverse table that minimizes a certain information-loss metric.
Nevertheless, it appears that the problem should be NP-hard for most metrics.
In any case,
l
-diverse generalization also inherits the defect of
k
-anonymity
that the amount of information loss will be inevitably large, when the number
of QI-attributes is high. This defect can have rather negative influences on the
utility of the published table, as discussed in the next section.
−
4Anatomy
So far our discussion has employed generalization as the underlying
anonymization framework. In this section, we proceed to introduce another
framework: anatomy. As with generalization, anatomy can be combined with
k
-anonymity and
l
-diversity. The following analysis focuses on
l
-diversity, due
to its obvious advantages over
k
-anonymity. In particular, we adopt frequency
l
-diversity (Definition 2), to avoid the complication of recursive (
c, l
)-diversity.
Accordingly, we will assume that probabilistic homogeneity attacks are the ob-
jective of privacy protection. The discussion in this section, however, can be
extended to
k
-anonymity and other versions of
l
-diversity in a straightforward
manner.
1
In the original proposition of [11], the “
” in Inequality 1 is “
<
”. We adopt “
”
≤
≤
here to simplify discussion, but the rationale extends naturally to “
<
” as well.