Graphics Reference
In-Depth Information
Introduction
10.1
Due to the rapid development of information technology in recent years, it is com-
montoencounterenormousamountsofdatacollectedfromdiversesources.hishas
led to a great demand for innovative analytic tools that can handle the kinds of com-
plex data sets that cannot be tackled using traditional statistical methods. Modern
data visualization techniques face a similar situation and must also provide adequate
solutions.
High dimensionality is always an obstacle to the success of data visualization. As
well as this problem, explorations of the information and structures hidden in com-
plicated data can be very challenging. Parametric models, on the one hand, are of-
ten inadequate for complicated data; on the other hand, traditional nonparametric
methods can be far too complex to implement in a stable and affordable way due to
the “curseof dimensionality.” hus,the development of newnonparametric methods
for analyzing massive data sets is a highly demanding but important task. Follow-
ing the recent successes in many fields of machine learning, kernel methods (e.g.,
Vapnik, ) can certainly provide us with powerful tools for such analyses. Ker-
nel machines facilitate the flexible and versatile nonlinear analysis of data in a very
high-dimensional (oten with infinite dimensions) reproducing kernel Hilbert space
(RKHS). he rich mathematical theory as well as topological and geometric struc-
tures associated with reproducing kernel Hilbert spaces enable probabilistic inter-
pretation and statistical inference. hey also provide a convenient environment that
is suitable for massive computation.
In many classical approaches, statistical procedures are carried out directly on
sample data in Euclidean space R p . In kernel methods, data are first mapped to
a high-dimensional Hilbert space via a certain kernel or its spectrum, and classi-
cal statistical procedures are then applied to these kernel-transformed data. Kernel
transformations provide us with a new way of specifying a “distance” or “similarity”
metric between different elements.
Ater preparing the raw data in kernel form, standard statistical and/or mathe-
matical sotware can be used to explore nonlinear data structures. For instance, we
can perform nonlinear dimension reduction bykernel principal component analysis
(KPCA), which can be used to construct high-quality classifiers as well as to provide
newangles of viewindata visualization. hatis,weareable toviewthe morecompli-
cated(highlynonlinear) structuresofmassivedatasetswithouttheneedtoovercome
the computational di culties of building complex models.Many multivariate meth-
ods can also be extended to cover highly nonlinear cases through a kernel machine
framework.
In this article, by combining the classical methods of multivariate analysis - such
as PCA, canonical correlation analysis (CCA) and cluster analysis - with kernel ma-
chines, we introduce their kernelized counterparts, which enable more versatile and
flexible data visualization.
Search WWH ::




Custom Search