Information Technology Reference
In-Depth Information
Sorting High-Dimensional Patterns with Unsupervised
Nearest Neighbors
Oliver Kramer
Department of Computer Science, University of Oldenburg,
Uhlhornsweg 84, 26111 Oldenburg, Germany
oliver.kramer@uni-oldenburg.de
Abstract. In many scientific disciplines structures in high-dimensional data have
to be detected, e.g., in stellar spectra, genome data, or in face recognition tasks. In
this work we present an approach to non-linear dimensionality reduction based on
fitting nearest neighbor regression to the unsupervised regression framework for
learning low-dimensional manifolds. The problem of optimizing latent neighbor-
hoods is difficult to solve, but the unsupervised nearest neighbor (UNN) formula-
tion allows an efficient strategy of iteratively embedding latent points to discrete
neighborhood topologies. The choice of an appropriate loss function is relevant,
in particular for noisy, and high-dimensional data spaces. We extend UNN by the
-insensitive loss, which allows to ignore small residuals under a defined thresh-
old. Furthermore, we introduce techniques to handle incomplete data. Experi-
mental analyses on various artificial and real-world test problems demonstrates
the performance of the approaches.
Keywords: Dimensionality reduction, Unsupervised regression, Nearest neigh-
bors, Robust loss functions, Missing data.
1
Introduction
Dimensionality reduction and manifold learning have an important part to play in the
understanding of data. Many disciplines in science and economy are based on collecting
high-dimensional patterns: from astronomy to psychology, from civil engineering to so-
cial web services. Algorithms are required that are able to process data efficiently. The
collection and understanding of data allows us to improve the efficiency of processes
in a variety of domains. There are numerous examples that reflect the importance of
the understanding of large data sets. The quality of sensors is steadily being improved.
The trend towards digitizing the world leads to large amounts of high-dimensional pat-
terns. For an efficient data analysis process fast dimensionality reduction methods are
required. UNN is a fast iterative approach based on unsupervised regression. The idea
of unsupervised regression is to reverse functional regression models such that low-
dimensional data samples in latent space optimally reconstruct high-dimensional out-
put data. We take this framework as basis for an iterative approach that fits K-nearest
neighbors (KNN) regression into this unsupervised setting.
The manifold problem we consider is a point-wise mapping F : y x from patterns
y IR
q with d>q . The problem is a hard optimization
problem as the latent variables X =( x 1 ,..., x N ) are unknown.
d to latent embeddings x IR
Search WWH ::




Custom Search