Privacy Preserving Publication: Anonymization Frameworks and Principles - Database Security: Applications and Trends

Databases Reference

In-Depth Information

tuple ID Age Sex Zipcode Disease

1 (Bob)

23

M

11000

pneumonia

2

27

M

13000

dyspepsia

3

35

M

59000

dyspepsia

4

59

M

12000

pneumonia

5

61

F

54000

flu

6

65

F

25000

gastritis

7 (Alice)

65

F

25000

flu

8

70

F

30000

bronchitis

(a) The microdata

tuple ID

Age

Sex

Zipcode

Disease

1

[21, 60] M [10001, 60000] pneumonia

2

[21, 60] M [10001, 60000] dyspepsia

3

[21, 60] M [10001, 60000] dyspepsia

4

[21, 60] M [10001, 60000] pneumonia

5

[61, 70]

F

[10001, 60000]

flu

6

[61, 70]

F

[10001, 60000]

gastritis

7

[61, 70]

F

[10001, 60000]

flu

8

[61, 70] F [10001, 60000] bronchitis

(b) A 2-diverse table

Table 3. Another generalization example

4.1 Motivation

Although generalization preserves privacy, it often loses considerable infor-

mation in the microdata, which severely compromises the accuracy of data

analysis. We illustrate this by using the microdata in Table 3a and the 2-

diverse generalization in Table 3b. Assume that a researcher wants to derive

from this table an estimate for the following query:

A: SELECT COUNT (*) FROM Unknown-Microdata

WHERE Disease = 'pneumonia' AND Age < =30

AND Zipcode IN [10001 , 20000]

To illustrate how to process the query, Figure 1 shows a 2D space, where

the x-, y-dimensions are Age and Zipcode , respectively. Each point denotes

a tuple in the microdata of Table 3a. For example, the x-, y-coordinates of

point 1 equal the age and zipcode of tuple 1, respectively. Rectangle R 1 (or

R 2 ) is obtained from the generalized values in the first (or second) QI-group

in Table 3b. For instance, the x- (y-) projection of R 1 is the generalized age

[20 , 60] (zipcode [10001, 60000]) of tuples 1-4. Query A is represented as the

shaded rectangle Q , whose projection on the x- (y-) dimension is decided by

the range condition Age

20000).

Since the researcher sees only R 1 and R 2 (but not the points), s/he an-

swers query A in a way similar to selectivity estimation on a multidimensional

≤

30 (10001

≤

Zipcode

≤

Database Security: Applications and Trends

Search WWH ::

Custom Search

Home