Databases Reference
In-Depth Information
R 1
R 2
60k
50k
40k
30k
20k
3
5
8
Q
6 and 7
1 2
4
10k
x ( Age )
20
30
40
50
60
70
Fig. 1. The original and generalized data in the Age-Zipcode plane
histogram [16], as suggested in [9]. Clearly, as R 2 is disjoint with Q ,notuple
in the second QI-group can satisfy the query. R 1 , however, intersects Q ,and
hence, is examined as follows.
From the Disease -values in Table 3b, the researcher knows that 2 tuples in
the first QI-group are associated with pneumonia. It remains to calculate the
probability p that a tuple in the QI-group qualifies the range predicates of A,
or equivalently, the tuple's point representation falls in Q (Figure 1). Once p is
available, the query answer can be estimated as 2 p . Without additional knowl-
edge, the researcher assumes uniform data distribution in R 1 , and computes
p as Area ( R 1
R Q ) /Area ( R 1 )=0 . 05. This value leads to an approximate
answer 0.1, which, however, is ten times smaller than actual query result 1
(see Table 3a).
The gross error is caused by the fact that the data distribution in R 1
significantly deviates from uniformity. Nevertheless, given only the generalized
table, we cannot justify any other distribution assumption. This is an inherent
problem of generalization: it prevents an analyst from correctly understanding
the data distribution inside each QI-group.
4.2 Rationale of Anatomy
Anatomy overcomes the above defect of generalization, by releasing the ex-
act QI-distribution without compromising the quality of privacy preservation.
Specifically, anatomy releases a quasi-identifier table (QIT) and a sensitive
table (ST), which separate QI-values from sensitive values. For example, Ta-
bles 4a and 4b demonstrate the QIT and ST obtained from the microdata
Table 3a, respectively.
Construction of the anatomized tables can be (informally) understood as
follows. First, we partition the tuples of the microdata into several QI-groups,
based on a certain strategy. Here, following the grouping in Table 3b, let us
place tuples 1-4 (or 5-8) of Table 3a into QI-group 1 (or 2).
Then, we create the QIT. Specifically, for each tuple in Table 3a, the QIT
(Table 4a) includes all its exact QI-values, together with its group membership
Search WWH ::




Custom Search