Database Reference
In-Depth Information
In the literature, many other types of algorithms are mentioned, but most of
them may be accounted for by the three types mentioned here. 9 These three types
of data mining may be relevant to group profiling.
o
o
o
x
x
o
o
x
o
o
x
x
o
*
x
o
o
x
x
o
x
x
*
x
*
x
x
o
x
x
x
x
x
o
o
*
o
(A)
(B)
(C)
Fig. 2.2 Examples of different types of discovery algorithms: Pattern mining with a linear
regression function (A), clustering (B), and classification (C)
2.4.1 Classification
Classification in its simplest form is the ordering of data into groups or classes on
the basis of their similarity. 10 Similarity is usually determined using distance
scores (see the previous subsection). The difference between clustering and
classification is that classification uses predefined classes, while clustering is used
to establish such classes or groups. Two basic requirements of classification are
that the classes must be both exhaustive and mutually exclusive . 11 This means that
all data can be assigned to one class and one class only. Of course, there may be
some classes to which no data are assigned, but there is no data that cannot be
assigned to any class. Classifier induction stands for the task of learning a
classification model based on training data. In order to learn to classify based on
training data, the correct labels need to be given in the database table containing
the training data. This information is provided by adding a special dedicated class
attribute that records the class a record belongs to. The value of the class attribute
for a record is often referred to as the class, class label or label of the record.
Example 2 (Classification) Based upon historical client records, an insurance
company wants to learn a model for predicting the risk category of a new
customer applying for car insurance, based upon his or her gender, type of car to
9 E.g., decision trees may be further divided into classification trees and regression trees;
see Berry, M.J.A., and Linoff, G.S. (2000).
10
Bailey, K.D. (1994).
11
Note that these requirements need not be fulfilled in the case of clustering.
Search WWH ::




Custom Search