Database Reference
In-Depth Information
like 'need to know', 'select before you collect', and many of the OECD privacy
principles 6 (including the purpose specification principle, see below).
However, these mechanisms for limiting the collection and distribution of
information are failing, for several reasons. First, from a practical perspective,
informational self-determination is complicated because people often do not know
who collects and processes their personal data. This is mainly due to the fact that
most personal data collecting no longer takes place directly, i.e., by asking data
subjects for the data, but indirectly, for instance, by sharing or buying datasets or
coupling databases. When collecting data indirectly, it is far more difficult for data
subjects to know who is processing their personal data and to exercise any form of
control over their data.
Second, already in 1948, it was shown that the dissemination of information
follows the rules of entropy. 7 Basically this means that it is easy to spread
information, but very difficult to withdraw information from the public sphere. In
the information society this is more obvious than ever. Everyone knows that it
only takes two mouse clicks to copy and send information to dozens of people (or
many more). Since the spreading of information always proceeds in one direction
(towards a larger entropy), principles focusing on access controls are increasingly
inadequate in a world of automated and interlinked databases and information
networks, in which individuals are rapidly losing grip on who is using their
information and for what purposes. Due to the movement towards larger entropy,
it may be difficult for people to know where their information will end up. This is,
in fact, an argument for greater control over information. However, according to
the rules of entropy, the extent of this control is limited to stopping or slowing
down the increase of entropy. According to the rules of entropy, it is impossible to
reverse the increase of entropy. 8
Third, throughout this topic, it was shown that data mining technologies are
useful tools for profiling, i.e., ascribing characteristics to individuals or groups of
people. Most data mining technologies are very good at dealing with datasets that
are incomplete or incorrect. Missing data generally do not constitute a problem
when searching for patterns, as long as the total amount of missing data is not too
large compared to the amount of data available. Hence, with the help of data
mining predictions, the blanks (missing data) can easily be completed in datasets.
6
See the 1980 principles for fair information processing developed by the Organization for
Economic Co-operation and Development (OECD).
See http://www1.oecd.org/dsti/sti/it/secur/prod/PRIV-EN.HTM.
7 Shannon, C.E. (1948) and Shannon, C.E. (1949). The entropy of data item X is expressed
n
=
H
(
X
)
=
p
(
X
)
log
p
(
X
)
as
with X i ,…X n the n possible values with
i
2
i
i
1
probabilities p(X 1 ),…,p(X n ), where
p
(
X
)
=
1
; see also Denning, D.E. (1983),
i
p. 17.
8 Deleting data from databases does in fact decrease the entropy, but when copies of
particular data remain, sooner or later the entropy may increase again.
Search WWH ::




Custom Search