Database Reference
In-Depth Information
(a result of data generalization) satisfies t -closeness if the distance between the
distribution of a sensitive attribute in this cluster and the distribution of this
attribute in the whole table T is no more than a threshold t . In that manner
t- closeness may, in principle, prevent discrimination by making it impossible to
assert negative inferences about the sensitive attribute based on a cluster member-
ship, such that these inferences would be stronger than the ones for the entire table
(the whole population). It is clear, however, that requiring t- closeness imposes a
very strong constraint on the generalization process, resulting in a potentially very
significant distortion of data, thereby decreasing the quality of the data (and any
model obtained from it) unacceptably.
It is worth observing that the attack model behind data k- anonymity is some-
what unrealistic. It assumes that the attacker has a total knowledge of all values of
the attributes for a given instance, which will normally not be the case. Starting
with this observation, more realistic models have been proposed. For instance, in
(Mohammed, Fung et al. 2009) the attack model assumes that the attacker's know-
ledge is limited to L quasi-identifiers, and the k- anonymization is limited to those
identifiers.
k -anonymization is often the method of choice in data publishing, particularly
for medical data. The reason is that, unlike other perturbative methods discussed
in the next section, the approach does not distort the data: even the generalized da-
ta is “true”, i.e. it represents true (even though possibly imprecise) statements
about the original data.
A completely different identity disclosure attack is possible when the model
build using data mining techniques such as classification or association rules is so
granular (on a specific data set) that it identifies a specific individual. Publishing
such model alone, even without access to data from which it has been obtained,
would then disclose data values that the model represents for that specific individ-
ual. Rule-hiding is an approach attempting to solve this problem. For instance,
(Verykios, Elmagarmid et al. 2004) present strategies prevening association rules
with a sensitive attribute in the consequent from being produced by the association
rule mining algorithms. Another approach to rule hiding is described in (Oliveira,
Zaïane et al. 2004). These strategies are based on reducing the support and
confidence of rules with such attributes in the consequent. (Atzori, Bonchi et al.
2008) show how such disclosure can be avoided by elegantly generalizing to mod-
els the concept of k- anonymity discussed above for the data.
11.3 Attribute Disclosure
A different set of methods protecting against disclosure of a value of sensitive
attribute are the perturbative methods. They implement the “camouflage” para-
digm. The seminal work in this area is due to (Agrawal and Srikant 2000). The
main idea is simple: an attribute (say, a j -th column in T ) is systematically
changed by adding to each a ij , i= 1… n , a value obtained from a probability
Search WWH ::




Custom Search