Information Technology Reference
In-Depth Information
depending on the clustering properties (therefore, on object similarity cri-
teria) one may impose. Moreover, even though there are validity techniques
applicable to gauging the clustering solutions, they can be nonetheless eval-
uated from different perspectives.
For supervised classification problems, besides the X ds set one also re-
quires a set Ω ds of class labels assigned to the data objects by a su-
pervisor. For instance, for the above electrocardiogram classification prob-
lem, it is assumed that a supervisor (physician, in this case) labeled each
electrocardiogram as either being “normal” or “abnormal”. We denote by
Ω =
{
ω k ; k =1 ,...,c
}
the set of c
N
possible class labels (e.g.,
Ω =
for the two-class electrocardiogram problem)
and we will usually find it convenient to code the labels with numerical val-
ues from a set T =
{
“normal”, “abnormal”
}
{
t k ; k =1 ,...,c
}⊂ Z
, using some one-to-one Ω
T
mapping function (e.g., Ω =
{
“normal”, “abnormal”
}→
T =
{
0 , 1
}
). We call
T the target valueset.Wethenhave T ds =
{
t i
T ; i =1 , 2 ,...,n
}
as a set
of n target values t i = t ( x i ) assigned by some unknown labeling function,
t : X
T . The target values are seen as instantiations of a target r.v. also
denoted T .
We call supervised classifier, or just classifier, any X
T mapping im-
plemented by a supervised classification system. Designing a classifier corre-
sponds to picking one function z w out of a family of functions
Z W =
{
z w :
X
, through the selection (tuning) of a parameter (either a
single or multi-component parameter sequence) w from W . Examples of clas-
sifiers are decision trees and neural networks, where
T ; w
W
}
Z W corresponds to all
X
T mappings that the architecture of these devices implements, w being
a particular choice of parameters (respectively, thresholds and weights), and
W the parameter space. The classifier output Z w = z w ( X ) is a r.v. whose
codomain is a subset of T .
A classifier may implement an X
T mapping using several “physical”
outputs: tree leaves in the case of decision trees, network output neurons in
the case of neural networks, etc. In some cases an arbitrarily large number
of “physical” outputs may exist; for instance, a decision tree can have an
arbitrarily large number of hierarchical levels with a consequent arbitrarily
large number of leaves, the “physical” outputs. Mathematically, we are often
only interested in characterizing a single X
T mapping, independently
of how many “physical” outputs were used to implement such mapping. The
association to class labels can be materialized in various ways. For instance,
decision trees have as many outputs as there are tree leaves, whose number
is usually larger than c ; since each tree leaf represents a single class label,
as a consequence each class label may be represented by more than one leaf.
Other classifiers, such as many neural networks, have instead c outputs, as
many as class labels, with the possible exception of two-class problems for
which only one output is needed, usually, coded 1 for one of the classes, and 0
or
1 for its complement. For c> 2 it is customary to express both the target
values and the outputs as c -dimensional vectors using a 1-of- c coding scheme.
Search WWH ::




Custom Search