Data Visualization via KernelMachines - Data Visualization

Graphics Reference

In-Depth Information

Introduction

10.1

Due to the rapid development of information technology in recent years, it is com-

montoencounterenormousamountsofdatacollectedfromdiversesources.hishas

led to a great demand for innovative analytic tools that can handle the kinds of com-

plex data sets that cannot be tackled using traditional statistical methods. Modern

data visualization techniques face a similar situation and must also provide adequate

solutions.

High dimensionality is always an obstacle to the success of data visualization. As

well as this problem, explorations of the information and structures hidden in com-

plicated data can be very challenging. Parametric models, on the one hand, are of-

ten inadequate for complicated data; on the other hand, traditional nonparametric

methods can be far too complex to implement in a stable and affordable way due to

the “curseof dimensionality.” hus,the development of newnonparametric methods

for analyzing massive data sets is a highly demanding but important task. Follow-

ing the recent successes in many fields of machine learning, kernel methods (e.g.,

Vapnik, ) can certainly provide us with powerful tools for such analyses. Ker-

nel machines facilitate the flexible and versatile nonlinear analysis of data in a very

high-dimensional (oten with infinite dimensions) reproducing kernel Hilbert space

(RKHS). he rich mathematical theory as well as topological and geometric struc-

tures associated with reproducing kernel Hilbert spaces enable probabilistic inter-

pretation and statistical inference. hey also provide a convenient environment that

is suitable for massive computation.

In many classical approaches, statistical procedures are carried out directly on

sample data in Euclidean space R p . In kernel methods, data are first mapped to

a high-dimensional Hilbert space via a certain kernel or its spectrum, and classi-

cal statistical procedures are then applied to these kernel-transformed data. Kernel

transformations provide us with a new way of specifying a “distance” or “similarity”

metric between different elements.

Ater preparing the raw data in kernel form, standard statistical and/or mathe-

matical sotware can be used to explore nonlinear data structures. For instance, we

can perform nonlinear dimension reduction bykernel principal component analysis

(KPCA), which can be used to construct high-quality classifiers as well as to provide

newangles of viewindata visualization. hatis,weareable toviewthe morecompli-

cated(highlynonlinear) structuresofmassivedatasetswithouttheneedtoovercome

the computational di culties of building complex models.Many multivariate meth-

ods can also be extended to cover highly nonlinear cases through a kernel machine

framework.

In this article, by combining the classical methods of multivariate analysis - such

as PCA, canonical correlation analysis (CCA) and cluster analysis - with kernel ma-

chines, we introduce their kernelized counterparts, which enable more versatile and

flexible data visualization.

Search WWH ::

Custom Search

Home