Introduction - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

regression, and probability density (PDF) estimation — the data-based de-

vice has to learn some desired information — respectively, class labeling,

functional description, and probability density — from the data.

To formalize the classification problem, we start by assuming that a dataset

X ds is available for the inductive design of the classifier: design or training

set. The training set X ds can be viewed as an array whose rows correspond to

data objects (e.g., individual electrocardiograms for the above electrocardio-

gram classification problem), and whose columns represent object attributes

(measurements, features). We denote by n the number of objects (also called

instances or cases) of X ds . Each instance is represented by an ordered se-

quence of attributes with d elements x j from some space X (the input space

of the classification system). The attributes can be numerical — and in this

case we always assume an underlying real number domain — , or nominal

(categorical), say a set B of categories. For the above electrocardiogram clas-

sification problem an instance is represented by electrocardiographic signal

features (amplitudes and durations of signal waves), measured as real num-

bers, and by categorical features such as sex ( B ={“male”, “female”}).

We will often be dealing with instances characterized solely by numerical

attributes; in this case X ds ⊂

X is (repre-

sented as) an ordered sequence ( d -tuple): x =( x 1 ,x 2 , ..., x j , ..., x d ).

Sometimes we may find it convenient to use vector notation for x, x =

[ x 1 x 2 ...x j ...x d ] T , specifically when vector operations are required; X ds

is then represented by an n

X =

R

d , and any instance x

∈

d real matrix.

Any attribute value x j is a realization value of a random variable (r.v.) X j ,

whose codomain is X j ;whether X j denotes a codomain or a variable will be

obvious from the context. Note that X j may have a single Dirac- δ distribu-

tion, in which case X j is in fact a deterministic variable (a degenerate random

variable). We will also denote by X the d -dimensional r.v. whose codomain is

X and whose realization values are the d -tuples x =( x 1 ,x 2 , ..., x j , ..., x d );

X will be characterized by a joint distribution of the X j with cumulative dis-

tribution function F X .

Throughout the topic all data instances in X ds are assumed as having

been obtained by an independent and identically distributed (i.i.d.) sampling

process, from a d -dimensional joint probability distribution with cumulative

distribution function F X characterizing a large (perhaps infinite) population

of instances. For numerical attributes defined in bounded intervals of

×

(as

the electrocardiographic measurements) one may still use the real line as

domain, by assigning zero probability outside the intervals.

When confronted with unsupervised classification problems (popularly

known as data clustering problems), i.e., when one wants the classifica-

tion system to find a structuring solution that partitions the data into

“meaningful” groups (clusters) according to certain criteria, the X ds set,

X ds = {x i = x i 1 ,x i 2 , ..., x ij , ..., x id ); i =1 , ..., n} ,isa l

that is required. Data clustering is a somewhat loose type of classification

problem, since one may find a variety of solutions (unsupervised classifiers)

R

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home