Databases Reference
In-Depth Information
Proprietary data
Anonymized data
Name Age Gender
Zip
Ailment
Age
Gender
Zip
Ailment
John
20
M
92122
flu
[20-25)
*
9212*
flu
Jane
22
F
92121 pneumonia
[20-25)
*
9212* pneumonia
Jack
26
M
92093
cold
[25-30)
*
9209*
cold
Jill
29
F
92094 bronchitis
[25-30)
*
9209* bronchitis
Fig. 1. Anonymization in Example 9
Anonymization. The generalization function g defines an anonymizing
function
A g on R , which drops the ID attributes of each R -tuple, keeps the
sensitive attributes unchanged, and substitutes the QI attributes with the
result of g . If duplicates are created in this process, then they are all preserved.
We have
A g ( R ):=
{{
t : QI, S
|
r
R, t [ QI ]= g ( r [ QI ])
t [ S ]= r [ S ]
}}
,
where t [ X ] denotes the projection of tuple t on attribute list X , and where
{{}}
denote multi-set comprehensions (which preserve duplicates, as opposed
to the set comprehensions denoted with
{}
).
Example 9. In Figure 1, the proprietary table R on the left has ID attribute
Name , QI attributes Age, Gender, Zip , and S attribute Ailment . The table
on the right is its anonymization
A g ( R ) where g replaces age with the 5-year
interval it falls in, suppresses gender and hides the least significant digit of
the zip code.
R , the owner wishes to preserve the privacy of the
association between the identifier r [ ID ] and the sensitive attribute values
r [ S ]. Since the sensitive attributes are published in clear, the attacker needs
to guess only r [ ID ]. Intuitively, the anonymization
Given a tuple r
A g “hides the identity
r [ ID ] in a crowd” of possible identities, forcing the attacker to guess among
them. The larger the crowd, the lower the chance of guessing right.
Equivalence under generalization. This crowd comprises the identities
of all tuples whose projection on the quasi-identifiers generalizes under g to
the same value. It is easy to see that the property of two tuples having the
same image of their QI projection under g is an equivalence relation. Denoting
with [ r ] g the equivalence class of r ,wehave
[ r ] g :=
r
g ( r [ QI ]) = g ( r [ QI ])
{
R
|
}
.
In Example 9, the tuples of table R are partitioned by g into two equiv-
alence classes, one comprising the tuples for John and Jane, the other the
tuples for Jack and Jill.
Now consider a tuple t
∈A g ( R ) which is the image under
A g of some
tuple r
R . When the attacker observes the occurrence of sensitive attribute
Search WWH ::




Custom Search