Database Reference
In-Depth Information
reasonably to be used for identification []. This is particularly relevant in the case
of statistical information, where despite the fact that the information may be pre-
sented as aggregated data, the original sample is not sufficiently large and other
pieces of information may enable the identification of individuals.' 25 This refers
among others to techniques used in the data mining process.
The data minimization principles are often referred to in technical literature as
well. The abovementioned principles are often caught in the phrase 'input privacy
data mining'. First, a limitation may be posed on the inclusion in databases of in-
formation related to privacy or discrimination sensitive data. Second, limitations
may be posed on the use of such data for data mining practices, among others
through the use of cell suppression and restricting access to statistical queries that
may reveal confidential information. 26 The main goal of 'input privacy data min-
ing' is to minimize the amount of sensitive data, but still allow for an equally val-
uable data mining process: the so called 'no-outcome-change' property. 27
Somewhat less well-known and less practiced is the concept of 'output privacy
data mining'. 28 This does not refer to the inclusion of data in the database or the
use of particular data in data mining processes, but refers to the use of data in the
outcome of this process, for example in the rule, pattern or profile distilled from
the data. 29 The reason for this additional instrument is that 'input privacy data
mining' is not always sufficient to exclude privacy violations or discriminatory re-
sults. 30 This may either be caused by masking, indirect discrimination or re-
identification, but may also be due to the fact that even although no sensitive data
was used in the data mining process, the eventual outcome may still be discrimina-
tory or violate someone's privacy. 31 To address outcome based problems, technic-
al solutions may be implemented to prevent particular data from being used in ac-
tual practices and decisions.
15.6 Loss of Contextuality
The principles of data minimization described above help to minimize both the
risk and the scale of damage if for example data is misused or a data leak occurs.
Also, it may limit the use of particular compromising data in actual practices and
decisions. There are however several downsides to using this technique. Firstly,
the dataset may lose part of its value through this process. 'From a data mining
perspective the primary issue with informational privacy is that by limiting the use
of (particular) personal data, we run the risk of reducing the accuracy of the data
mining exercise. So while privacy may be protected, the utility of the data mining
25 Working Party (2007), p. 21.
26 Ruggieri, Pedreschi & Turini (2010); Pedreschi, Ruggieri & Turini (2008); Custers
(2004).
27 Bu et al. (2007).
28 Wang & Liu (2008).
29 Verykios et al. (2004).
30 Kantarcıoglu, Jin & Clifton (2004).
31 Porter (2008).
Search WWH ::




Custom Search