Databases Reference
In-Depth Information
where c j ( v ) is the number of tuples t
QI j with t [ d +1] = v. Apart from
the tuples (or records) defined earlier, the QIT (or ST) does not contain any
other data.
For instance, based on the 2-diverse partition suggested in Table 3b, anatomy
produces the QIT and ST in Tables 4a and 4b respectively, as explained in
Section 4.2.
When there is no ambiguity, we refer to a pair of QIT and ST collectively
as the anatomized tables . In Section 4.6, we will show that anatomized tables
capture the correlation in T more accurately than generalized tables. For this
purpose, we also need to formalize generalization.
Definition 6. (Generalization) Given a partition of T with m QI-groups,
for any tuple t
T ,a generalized table of T contains a tuple of the form
( QI j [1] ,QI j [2] , ..., QI j [ d ] ,t [ d + 1])
(7)
where QI j ( 1
d)isaninterval 2 covering t [ i ] . Furthermore, QI j [ i ] is identical for all tuples
t
j
m) is the unique QI-group including t,andQI j [ i ] ( 1
i
QI j . Apart from the tuples defined earlier, the table does not contain any
other data.
For instance, let t be tuple 1 in the microdata Table 3a. We have j =1,
namely, t is contained in the first QI-group. In the generalized Table 3b,
QI 1 [1] = [21 , 60] (the generalized age of tuple 1), QI 1 [2] = M, and QI 1 [3] =
[100001 , 60000], which, together with t [4] = pneumonia, form the first tuple.
We would like to point out that, although Definition 5 is based on an l -
diverse partition, in general, anatomy produces a pair of QIT and ST from
any partition (Definition 4) in exactly the same way. In particular, any k -
anonymous or l -diverse table has an anatomized counterpart. We concentrate
on l -diverse partitions to achieve strong privacy preservation. Several algo-
rithms have been developed [18] to compute anatomized tables that minimize
certain metrics of information loss. Interestingly, unlike optimal generalization
that is NP-hard, optimal anatomy can be achieved in polynomial time.
4.4 Privacy Preservation
A pair of anatomized tables provide a convenient way for the data publisher to
find out, for each tuple t
T , all the A s values that an adversary can associate
t with, and the probability of each association. This is formally explained in
the next lemma.
Lemma 1 ([18]). If we perform a natural join QIT ST, the join result is
a table with d +3 attributes, containing records of the form
2 If A qi
i
is categorical, following a common assumption in the literature, we consider
that there is a total ordering on A qi
i
.
Search WWH ::




Custom Search