Database Reference
In-Depth Information
Profiles based on statistical (but not necessarily causal) relationships may result
in problems that are different from the problems of profiles based on causal
relations, such as the aforementioned masking. Statistical results of data mining
are often used as a starting point to find underlying causality, but it is important to
note that merely statistical relations may already be sufficient to act upon, for
instance, in the case of screening for diseases. The automated generation of
hypotheses contributes to the scale argument as well: the number of profiles
increases largely because non-causal relations can be found as well.
A third difference between data mining and empirical statistical research is that
with the help of data mining trivial information may be linked (sometimes
unintentionally) to sensitive information. Suppose data mining shows a relation
between driving a red car and developing colon cancer. Thus, a trivial piece of
information, the color of a person's car, becomes indicative of his or her health,
which is sensitive information. In such cases the lack of transparency regarding
data mining may start playing an important role: people who provide only trivial
information may be unaware of the fact that they may also be providing sensitive
information about themselves when they belong to a group of people about whom
sensitive information is known. People may not even know to what groups they
belong.
A fourth difference lies in a characteristic of information and communication
technology that is usually referred to as the 'lack of forgetfulness of information
technology'. 31,32 Once a piece of information has been disclosed, it is practically
impossible to withdraw it. Computer systems do not forget things, unless
information is explicitly deleted, but even then information can often be
retrieved. 33 Since it is often difficult to keep information contained, it may spread
through computer systems by copying and distribution. Thus, it may be difficult to
trace every copy and delete it. This technological characteristic requires a different
approach to finding solutions for the problems of profiling and data mining.
1.3.2 Problems and Solutions
This is a topic about discrimination and privacy. That makes it a topic on
problems. However, instead of only discussing problems, we also provide
solutions or directions for solutions to these problems. If data mining and profiling
have undesirable effects, it may be regulated in several ways. Lessig distinguishes
four different elements that regulate. 34 For most people, the first thing that comes
to mind is to use legal constraints. Laws may regulate where and when and by
whom data mining and profiling are allowed and under which conditions and
circumstances. They operate as a kind of constraint on everyone who wants to use
data mining and profiling.
31 Blanchette, J.F., and Johnson, D.G. (1998).
32 For this argument it should be noted that data mining is regarded as an information
technology, contrary to empirical statistical research.
33 It may be argued that paper files do not 'forget' either, but paper files are, in general, less
accessible and thus there is generally less spreading of the information they contain.
34 Lessig, L. (2006).
Search WWH ::




Custom Search