Introduction - On Statistical Pattern Recognition in Independent Component Analysis Mixture Modelling

Information Technology Reference

In-Depth Information

SSL methods proceed in different ways, for instance, by first applying an

unsupervised method for estimating mixture distribution P ð x Þ , and then associat-

ing the latent groups obtained with observed classes using D l . These methods are

derived from one or more of the following assumptions: smoothness (the label

function is smoother in high-density regions than in low-density regions); clus-

tering (if points are in the same cluster, they are likely to be of the same class);

manifold (the high-dimensional data lie roughly on a low-dimensional manifold;

and transduction (directly estimating the finite set of test labels, i.e., f : X u ! Y

only defined on the test set, instead of infers f : X ! Y on the entire space X, and

afterward returns f ð x i Þ at the test points). Thus, SSL methods can be roughly

organized into four classes depending on the assumptions considered: generative

models (e.g., semi-supervised clustering with constraints, SSL using maximum

likelihood estimation); low-density separation (e.g., transductive support vector

machine, SSL using semi-definite programming, data-dependent regularization);

graph-based methods (e.g., discrete regularization, SSL with conditional harmonic

mixing); and change of representation (e.g., graph kernel by spectral transforms,

spectral methods for dimensionality). Theoretical work in SSL has incorporated

and adapted a diverse set of tools that were initially developed in other branches of

machine learning such as kernel methods or Bayesian techniques. However, it has

been stated that the relevant work of SSL is in practical subjects that are related to

real-world applications [ 6 ]. Some examples of applications of SSL are classifi-

cation of protein sequences, prediction of protein functions, speech recognition,

and webpage classification. SSL only works in those cases where the knowledge

on p ð x Þ gained through the unlabelled data carry information that is useful in the

inference of p ð y j x Þ: The existence of classes must also be guaranteed: if there is a

densely populated continuum of objects, it may seem unlikely that they could ever

be distinguished into different classes.

1.1.3 Hierarchical Clustering

There are many cases when a big cluster can be divided to meaningful subclusters,

which can be divided into smaller subclusters, and so on. This kind of grouping

procedure is called hierarchical clustering and is commonly used for summarising

data structures. The most natural representation of hierarchical clustering is a

corresponding tree, called a dendrogram, which shows how the samples are

grouped, see Fig. 1.5 . Clusters at a level are grouped depending on a similarity

measure. This measure can be used to determine the best level of partition in the

hierarchical structure, depending on the specifics of the application. Another rep-

resentation for hierarchical clustering is using sets to represent the subclusters 0 [ 9 ].

Several similarity or distance measures have been proposed and these give rise

to different hierarchical structures. Among the most common measures are:

Euclidean, city-block, Chebyshev, Minkowski, quadratic, and Mahalanobis dis-

tance. These distances are defined between two data sets. There is another type of

Search WWH ::

Custom Search

Home