Database Reference
In-Depth Information
this topic provides technological solutions, particularly discrimination aware and
privacy preserving data mining techniques. Fourth, this topic explains state of the
art technologies, an advantage over topics published before, even though we
realize that technological developments are very fast, outdating this topic also
within a few years.
A New Technology
Profiles were used and applied in the past without data mining, for instance, by
(human) observation or by empirical statistical research. Attempts were often
made to distinguish particular individuals or groups and investigate their
characteristics. Thus, it may be asked what is new about profiling by means of
data mining? Is it not true that we have always drawn distinctions between
people?
Profiling by means of data mining may raise problems that are different from
the problems that may be raised by other forms of statistical profiling such as
taking test samples, mainly because data mining generates hypotheses itself.
Empirical statistical research with self-chosen hypotheses may be referred to as
primary data analysis , whereas the automated generating and testing of
hypotheses, as with data mining, may be referred to as secondary data analysis . In
the automated generating of hypotheses, the known problems of profiling may be
more severe and new types of problems may arise that are related to profiling
using data mining. 29 There are four reasons why profiling using data mining may
be different from traditional profiling.
The first reason why profiling using data mining may cause more serious
problems is a scale argument. Testing twice as much hypotheses with empirical
research implies doubling the amount of researchers. Data mining is an automated
analysis and does not require doubling the amount of researchers. In fact, data
mining enables testing large numbers (hundred or thousands) of hypotheses (even
though only a very small percentage of the results may be useful). There may be
an overload of profiles. 30 Although this scale argument indicates that the known
problems of group profiling are more severe, it does not necessarily imply new
problems.
A second difference is that, in data mining, depending on the techniques that is
used, every possible relation can be investigated, while, in empirical statistical
research, usually only causal relationships are considered. The relations found
using data mining are not necessarily causal. Or they may be causal without being
understood. In this way, the scope of profiles that are discovered may be much
broader (only a small minority of all statistical relations is directly causal) with
unexpected profiles in unexpected areas. Data mining is not dependent on
coincidence. Data-mining tools automatically generate hypotheses, independent of
whether a relationship is (expected to be) causal or not.
29 A distinction may be made between technology-specific and technology-enhanced
concerns, because technology-specific concerns usually require new solutions, while
conventional solutions may suffice for the technology-enhanced concerns. See also
Tavani, H. (1999).
30 See also Mitchell, T.M. (1999) and Bygrave, L.A. (2002) , p. 301.
Search WWH ::




Custom Search