Database Reference
In-Depth Information
questions awaiting solutions and the forthcoming challenges. For a more technical
and a more complete presentation, the reader may consult (Vaidya, Zhu et al.
2006), or more recent, in-depth technical tutorials (Fung, Wang et al. 2010),
(Chen, Kifer et al. 2009).
Data privacy is often seen as an aspect of, or appendix to, data security. This is
not a correct view, as the goals of the two fields are divergent. On the one hand,
security protects the data against unauthorized access, e.g. reading the data while
it is transmitted across a network. But once the data reaches an authorized reci-
pient, security does not impose additional constraints having to do with revealing
personal information of an individual. This is, on the other hand, the goal of data
privacy. Such divergence of goals is well illustrated by public key cryptography
that protect the data encrypted using a person's private key, but also make the data
tightly linked to an individual whose public key is used to decipher it, thereby
identifying that individual. It is therefore correct to describe the relationship be-
tween data security and data privacy as the former being a prerequisite of the lat-
ter. Data must be protected in storage and transmission by data security methods
(e.g. with cryptographic techniques), but if data privacy is a goal, then additional
steps, some of them described below, must be taken to protect privacy of the indi-
viduals represented in the data.
Before reviewing current work in PPDM, we need to establish dimensions that
will structure this review. In order to identify those dimensions, we need to ground
the discussion in the process that PPDM addresses, mainly sharing data and results
of a data mining operation between users u 1 ,…u m , m≥ 2 . Furthermore, it is useful
to view the data as a database of n records, each consisting of l fields, where each
record represents an individual i i , and describes i i in terms of its fields. The usual
simplified representation is a table T , in which rows represent individuals i 1 ,…i n ,
and columns - referred to as attributes - represent the fields a 1 ,…a l . This assumes
a fixed representation, i.e. each individual is represented by a vector of values of
a 1 ,…a l .
For a holistic view of PPDM, the first useful dimension is to consider privacy
in terms of what is being protected, or conversely - what does an attacker want to
obtain from T. The second useful dimension is the ownership structure of the data
- does it belong to one entity and has to be shared with another entity ( m = 2) or is
it built from parts owned by different entities? We therefore propose to consider
the following dimensions:
What is being protected:
o
the data: an attacker, given T ,
will not be able to link any row in T to a specific i
[ identity disclosure ]
will not be able to obtain a value a ij of a sensitive
attribute a j of i i [ attribute disclosure ]
o
the inferred data mining result: an attacker, not knowing T but
given the results of the data mining operation, e.g. an association
rule learned from T , will be able to identify some attributes of a
specific i i [ model-based identity disclosure ]
Search WWH ::




Custom Search