Databases Reference
In-Depth Information
Table 2. A 2-anonymous version of Table 1a
Sarah, the neighbor examines the generalized QI-values in Table 2. Obviously,
Tuple 1 cannot be owned by Sarah, since it Age -value [1, 10] does not cover 28.
By this reasoning, Tuples 7-10 (in the last QI-group) are the only candidates
for the tuple of Sarah. Unable to make additional inferences from here, the
adversary settles on a fuzzy fact: Sarah may have got flu , pneumonia ,or
Alzeimer's .
Unlike Table 1a which allows the adversary to exactly derive the private
value of Sarah, Table 2 offers stronger protection. Evidently, protection is
made possible by the fact that generalization prevents the unique association
of a tuple to its owner. Specifically, since Tuples 7-10 share the same QI
formats, with a random guess, an adversary has only a 1 / 4 chance of correctly
identifying Tuple 9 as the real tuple owned by Sarah.
Based on the above idea, Sweeney and Samarati [14] propose k-anonymity
as an anonymization principle to measure the degree of generalization. For-
mally, a generalized table is k-anonymous if each QI-group contains at least
k -tuples. Table 2 is 2-anonymous, since the smallest QI-group (involving the
first two rows) has a size 2. In general, a higher k provides stronger protection
because, in general, k -anonymity guarantees that an adversary has at most
1 /k probability of finding out the actual tuple owned by the victim individ-
ual. As a tradeoff, however, increasing k also brings down the utility of the
generalized table, since more information must be lost in generalization. This
can be easily understood through an extreme example: k =
.Inthatcase,
all the tuples in the microdata T must be included in a single QI-group. As a
result, each generalized value must be a (very!) long interval encompassing all
the values in T on the corresponding QI-attribute. For instance, if k equals
10 and T is Table 1a, after generalization each Age -value becomes an interval
that covers the range [5 , 56]. Releasing such an excessively generalized table
is hardly more useful than publishing only the Disease -column of T .
A large bulk of research has been devoted to developing algorithms
[3, 6, 7, 8, 9, 13, 15, 17, 19] for computing a k -anonymous version of T ,
|
T
|
Search WWH ::




Custom Search