Applications - Minimum Error Entropy Classification

Information Technology Reference

In-Depth Information

This degree of similarity is defined by a similarity or dissimilarity measure,

where measure is allowed to be interpreted here in a lax sense, as some

monotonic set function d ( C i ,C j ) whose minimum value is d ( C i ,C i ) for any

C i .

The most common dissimilarity measure between two real-valued vectors

x and y ,istheweighted L p metric,

d p ( x , y )= d

p p

w i |

x i −

y i |

(6.50)

i =1

where x i and y i are the i th coordinates of x and y , i =1 , ..., d ,and w i ≥

0 is

the i th weight coecient. The unweighted ( w =1) L p metric is also known as

Minkowski distance of order p ( p

1). Examples of this distance are the well-

known Euclidian distance — the most common dissimilarity measure used by

clustering algorithms —, obtained by setting p =2, the Manhattan distance,

for p =1,andthe L ∞ or Chebyshev distance. The LEGClust algorithm also

uses a dissimilarity measure, although defined in an unconventional way. Dis-

similarities between objects x i and x j , for all objects represented by a set of

vectors

≥

d , are conveniently arranged in a dissimilarity

{ x 1 , x 2 ,..., x n }

, x i ∈ R

n×n , where each element of A is a i,j = d ( x i , x j ) with d ( x i , x j )

the dissimilarity between x i and x j (in rigor, d (

matrix A ∈ R

{ x i }

{ x j }

) for the singleton

sets

{ x i }

and

{ x j }

6.4.4 The LEGClust Algorithm

As mentioned earlier, clustering solutions may vary widely with the algorithm

being used and, for the same algorithm, with its specific settings. People also

cluster data differently according to their knowledge, perspective or experi-

ence. In [201] some clustering tests were performed involving several types of

individuals in order to try to understand the mental process of data cluster-

ing. The tests used two-dimensional datasets similar to those to be presented

in Sect. 6.4.5. Figure 6.20 shows one such dataset with different clustering

solutions suggested by different individuals.

The most important conclusion presented in [201] was that human clus-

tering exhibits some balance between the importance given to local (e.g.,

connectedness) and global (e.g., structuring direction) features of the data.

The tests also provided majority choices of clustering solutions that one can

use to compare the results of different clustering algorithms.

The following sections describe the LEGClust algorithm, first presented

in [204]. We first introduce the entropic dissimilarity matrix and, based on

that, the computation of the so-called layered entropic proximity matrix.

Minimum Error Entropy Classification

Search WWH ::

Custom Search

Home