Databases Reference
In-Depth Information
We can model t 1 using a probability density function (pdf)
G t 1 : DS A,D
[0 , 1]. Specifically:
G t 1 ( x )= 1if x =( t 1 [ A ] ,t 1 [ D ])
(12)
0 otherwise
where x is a 2D random variable in DS A,D . Figure 2a demonstrates the pdf.
Assume that a researcher wants to re-construct an approximate pdf ˜
gen
t 1
of t 1 from the generalized Table 3b. From her/his perspective, t 1 [ A ]canbe
any value in the interval [21 , 60] with equality probability 1 / 40, but t 1 [ D ]
must be pneumonia. Hence,
G
1 / 40 if x [ A ]
[21 , 60] and
x [ D ] =pneumonia
˜
gen
t 1
G
( x )=
(13)
0
otherwise
which is illustrated in Figure 2b.
Instead, suppose that the researcher re-constructs a pdf
˜
ana
t 1 from the
QIT and ST in Tables 4a and 4b. This time, s/he knows that t 1 [ A ]mustbe
23 (since age is published directly), but t 1 [ D ] can be pneumonia or dyspepsia
with 50% probability (the ST shows that half of the tuples in QI-group 1 are
associated with these two diseases, respectively). Therefore,
G
1 / 2if x = (23, pneumonia) or
x = (23, dyspepsia)
˜
ana
t 1
G
( x )=
(14)
0
otherwise
as shown in Figure 2c. Obviously, the pdf approximated from the anatomized
tables is more accurate than that (Figure 2b) from the generalized table.
Towards a more rigorous comparison, given an approximate pdf ˜
G t 1 (Equa-
tion 13 or 14), a natural way of quantifying its approximation quality is to
calculate its “ L 2 distance” from the actual pdf
G t 1 (Equation 12):
˜
−G t 1 ( x ) 2
G t 1 ( x )
.
(15)
x∈DS A,D
˜
ana
t 1
The distance of
G
is 0.5, indeed significantly lower than the distance 22.5
of ˜
gen
t 1 . Although we focused on t 1 , in the same way, it is easy to verify that
the anatomized tables permit better re-construction of the pdfs of all tuples
in Table 3a.
G
5 Summary
In this chapter, we studied two anonymization frameworks for privacy pre-
serving data publication: generalization and anatomy. Generally speaking,
Search WWH ::




Custom Search