Privacy Preserving Publication: Anonymization Frameworks and Principles - Database Security: Applications and Trends

Databases Reference

In-Depth Information

We can model t 1 using a probability density function (pdf)

G t 1 : DS A,D →

[0 , 1]. Specifically:

G t 1 ( x )= 1if x =( t 1 [ A ] ,t 1 [ D ])

(12)

0 otherwise

where x is a 2D random variable in DS A,D . Figure 2a demonstrates the pdf.

Assume that a researcher wants to re-construct an approximate pdf ˜

gen

t 1

of t 1 from the generalized Table 3b. From her/his perspective, t 1 [ A ]canbe

any value in the interval [21 , 60] with equality probability 1 / 40, but t 1 [ D ]

must be pneumonia. Hence,

⎧

⎨

1 / 40 if x [ A ]

[21 , 60] and

x [ D ] =pneumonia

∈

gen

t 1

( x )=

(13)

⎩

otherwise

which is illustrated in Figure 2b.

Instead, suppose that the researcher re-constructs a pdf

ana

t 1 from the

QIT and ST in Tables 4a and 4b. This time, s/he knows that t 1 [ A ]mustbe

23 (since age is published directly), but t 1 [ D ] can be pneumonia or dyspepsia

with 50% probability (the ST shows that half of the tuples in QI-group 1 are

associated with these two diseases, respectively). Therefore,

⎧

⎨

1 / 2if x = (23, pneumonia) or

x = (23, dyspepsia)

ana

t 1

( x )=

(14)

⎩

otherwise

as shown in Figure 2c. Obviously, the pdf approximated from the anatomized

tables is more accurate than that (Figure 2b) from the generalized table.

Towards a more rigorous comparison, given an approximate pdf ˜

G t 1 (Equa-

tion 13 or 14), a natural way of quantifying its approximation quality is to

calculate its “ L 2 distance” from the actual pdf

G t 1 (Equation 12):

−G t 1 ( x ) 2

G t 1 ( x )

(15)

x∈DS A,D

ana

t 1

The distance of

is 0.5, indeed significantly lower than the distance 22.5

of ˜

gen

t 1 . Although we focused on t 1 , in the same way, it is easy to verify that

the anatomized tables permit better re-construction of the pdfs of all tuples

in Table 3a.

5 Summary

In this chapter, we studied two anonymization frameworks for privacy pre-

serving data publication: generalization and anatomy. Generally speaking,

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home