Databases Reference
In-Depth Information
1
0.8
0.6
0.4
0.2
0
20
30
40
50
60
Age
(a) Original
1
0.8
0.6
0.4
1
0.8
0.6
0.4
0.2
0.2
0
0
20
30
40
50
60
20
30
40
50
60
Age
Age
(b) Approximated from generalization
(c) Approximated from anatomy
Fig. 2.
Original/re-constructed pdf of tuple 1 in Table 3a
tion are analogous to those discussed for the previous case where A1 is true
and A2 is not.
4.6 Correlation Preservation
A good publication method should preserve both privacy and data correlation
(between QI- and sensitive attributes). Using a concrete query, we have shown
in Section 4.2 that anatomy allows more effective aggregate analysis than
generalization. Next, we provide the underlying theoretical rationale.
Obviously, for any tuple
t
T
, every publication method will lose certain
information of
t
(if not, it is equivalent to disclosing
t
directly, contradicting
the goal of privacy). On the other hand, the method should permit devel-
opment of an approximate modeling of
t
(otherwise, the published table is
useless for research). Hence, the quality of correlation preservation depends
on how accurate the re-constructed modeling is.
Let us first examine the correlation between
Age
and
Disease
in the micro-
data of Table 3a. The two attributes define a 2D space
DS
A,D
. Every tuple in
the table can be mapped to a point in
DS
A,D
. For example, tuple 1, denoted
as
t
1
, corresponds to point (
t
1
[
A
]
,t
1
[
D
]), where
t
1
[
A
] is the age 23 of
t
1
,and
t
1
[
D
] its disease 'pneumonia'.
∈