Database Reference
In-Depth Information
criticized. A more thorough understanding of all these disciplines may help
responsible innovation and technology use.
1.2 Data Mining and Profiling
This topic addresses the effects of data mining and profiling, two technologies that
are no longer new but still subject to constant technological developments. Data
mining and profiling are often mentioned in the same breath, but they may be
considered separate technologies, even though they are often used together.
Profiling may be carried out without the use of data mining and vice versa. In
some cases, profiling may not even involve (much) technology, for instance, when
psychologically profiling a serial killer. There are many definitions of data mining
and profiling. The focus of this topic is not on definitions, but nevertheless, a
description of what we mean by these terms may be useful.
Before starting, it is important to note that data mining refers to actions that go
beyond a mere statistical analysis. Although data mining results in statistical
patterns, it should be mentioned that data mining is different from traditional
statistical methods, such as taking test samples. 7 Data mining deals with large
databases that may contain millions of records. Statisticians, however, are used to
a lack of data rather than to abundance. The large amounts of data and the way the
data is stored make straightforward statistical methods inapplicable. Most
statistical methods also require clean data, but, in large databases, it is unavoidable
that some of the data is invalid. For some data types, some statistical operations
are not allowed and some of the data may not even be numerical, such as image
data, audio data, text data, and geographical data. Furthermore, traditional
statistical analysis usually begins with an hypothesis that is tested against the
available data. Data mining tools usually generate hypotheses themselves and test
these hypotheses against the available data.
1.2.1 Data Mining: A Step in the KDD-Process
Data mining is an automated analysis of data, using mathematical algorithms, in
order to find new patterns and relations in data. Data mining is often considered to
be only one step, the crucial step though, in a process called Knowledge
Discovery in Databases (KDD). Fayyad et al. define Knowledge Discovery in
Databases as the nontrivial process of identifying valid, novel, potentially useful,
and ultimately understandable patterns in data. 8 This process consists of five
successive steps, as is shown in Figure 1.1. In this section, it is briefly explained
how the KDD process takes place. 9 A more detailed account on data mining
techniques is provided in Chapter 2.
7 Hand, D.J. (1998).
8 Fayyad, U.M., Piatetsky-Shapiro, G. and Smyth, P. (1996b), p. 6.
9 Distinguishing different steps in the complex KDD process may also be helpful in
developing ethical and legal solutions for the problems of group profiling using data
mining.
Search WWH ::




Custom Search