Sorting High-Dimensional Patterns with Unsupervised Nearest Neighbors - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

3.1

UNN Regression Problem

q×N

An UNN regression manifold is defined by variables x from X ∈ IR

with

unsupervised formulation of an UNN regression manifold

f UNN ( x ; X ):= 1

K

y i .

(3)

i∈N K ( x , X )

Matrix X contains the latent points x that define the manifold, i.e., the low-dimensional

representation of patterns Y . Parameter x is the location where the function is evaluated.

An optimal UNN regression manifold minimizes the DSRE:

minimize E ( X )= 1

F ,

N Y − f UNN ( x ; X )

(4)

A F := i =1 j =1 |

with Frobenius norm

2 . In other words: an optimal UNN

manifold consists of low-dimensional points X ∗ that minimize the reconstruction of the

data points Y w.r.t. KNN regression in data space. Regularization for UNN regression

is not as important as regularization for other methods that fit into the unsupervised

regression framework. For example, in UKR regularization means penalizing extension

in latent space with E p ( X )= E ( X )+ λ

a ij |

, and weight λ [16]. In KNN regression

moving the low-dimensional data samples infinitely apart from each other does not have

the same effect as long as we can still determine the K-nearest neighbors. For the fixed

discrete latent positions regularization is not necessary as the latent points cannot move

arbitrarily in latent space.

X

3.2

Iterative Strategy

For KNN not the absolute positions of data samples in latent space are relevant, but the

relative positions that define the neighborhood relations . This perspective reduces the

problem to a combinatorial search for neighborhoods

N K ( x i , X ) with i =1 ,...,N

that can be solved by testing all combinations of K -element subsets of N elements.

The problem is still difficult to solve, in particular for high dimensions.

The idea of the basic iterative UNN strategy is to iteratively assign the data samples

to a position in an existing latent space topology that leads to the lowest DSRE. We

assume fixed neighborhood topologies with equidistant positions in latent space, and

therefore restrict the optimization problem of Equation (3) to a search on a discrete

latent topology.

As a simple variant we consider the linear case q =1 with latent variables arranged

equidistantly on a line, i.e. x ∈ IR

. In this simplified case only the order of the elements

is important. The first iterative strategy works as follows:

1. Choose a pattern y from matrix Y ,

2. test all

N +1 intermediate positions of the

N =

Y |

embedded elements Y in

|

latent space,

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home