Information Technology Reference
In-Depth Information
SSL methods proceed in different ways, for instance, by first applying an
unsupervised method for estimating mixture distribution P ð x Þ , and then associat-
ing the latent groups obtained with observed classes using D l . These methods are
derived from one or more of the following assumptions: smoothness (the label
function is smoother in high-density regions than in low-density regions); clus-
tering (if points are in the same cluster, they are likely to be of the same class);
manifold (the high-dimensional data lie roughly on a low-dimensional manifold;
and transduction (directly estimating the finite set of test labels, i.e., f : X u ! Y
only defined on the test set, instead of infers f : X ! Y on the entire space X, and
afterward returns f ð x i Þ at the test points). Thus, SSL methods can be roughly
organized into four classes depending on the assumptions considered: generative
models (e.g., semi-supervised clustering with constraints, SSL using maximum
likelihood estimation); low-density separation (e.g., transductive support vector
machine, SSL using semi-definite programming, data-dependent regularization);
graph-based methods (e.g., discrete regularization, SSL with conditional harmonic
mixing); and change of representation (e.g., graph kernel by spectral transforms,
spectral methods for dimensionality). Theoretical work in SSL has incorporated
and adapted a diverse set of tools that were initially developed in other branches of
machine learning such as kernel methods or Bayesian techniques. However, it has
been stated that the relevant work of SSL is in practical subjects that are related to
real-world applications [ 6 ]. Some examples of applications of SSL are classifi-
cation of protein sequences, prediction of protein functions, speech recognition,
and webpage classification. SSL only works in those cases where the knowledge
on p ð x Þ gained through the unlabelled data carry information that is useful in the
inference of p ð y j x Þ: The existence of classes must also be guaranteed: if there is a
densely populated continuum of objects, it may seem unlikely that they could ever
be distinguished into different classes.
1.1.3 Hierarchical Clustering
There are many cases when a big cluster can be divided to meaningful subclusters,
which can be divided into smaller subclusters, and so on. This kind of grouping
procedure is called hierarchical clustering and is commonly used for summarising
data structures. The most natural representation of hierarchical clustering is a
corresponding tree, called a dendrogram, which shows how the samples are
grouped, see Fig. 1.5 . Clusters at a level are grouped depending on a similarity
measure. This measure can be used to determine the best level of partition in the
hierarchical structure, depending on the specifics of the application. Another rep-
resentation for hierarchical clustering is using sets to represent the subclusters 0 [ 9 ].
Several similarity or distance measures have been proposed and these give rise
to different hierarchical structures. Among the most common measures are:
Euclidean, city-block, Chebyshev, Minkowski, quadratic, and Mahalanobis dis-
tance. These distances are defined between two data sets. There is another type of
Search WWH ::




Custom Search