Sorting High-Dimensional Patterns with Unsupervised Nearest Neighbors - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

with f leading to the complete vector y j that can be embedded as usual, depending on

the employed approach (UNN, UNN g , etc.). To summarize, repair-and-embed works as

follows:

Y with minimal number of missing entries,

2. employ Y −i as training pattern for prediction of y ij , and predict with KNN regres-

sion (

1. Choose y j from Y \

y j ),

3. add y j to Y , and start from 1. until all patterns are complete,

4. embed all Y with UNN approach.

→

As KNN regression is a non-parametric method, no training is necessary, only K has to

be chosen carefully.

5.3

Embed-and-Repair

The second variant for embedding incomplete data is to embed a vector y j with missing

entries at dimension i ignoring the i -th dimension during the embedding (the computa-

tion of the DSRE), i.e., minimizing:

E −i ( X )= 1

F ,

N Y −i − f UNN ( x ; X ) −i

(7)

The approach starts iteratively with complete vectors y from Y , and then incomplete

patterns with increasing number of missing values. Starting the dimensionality reduc-

tion with complete patterns is reasonable to get as close as possible to the structure of

the complete embedding. Embed-and-repair is a greedy approach that only considers

the locally best embedding w.r.t. the available information. As the embedded pattern

has to be completed to allow embeddings of further patterns, the gaps are closed with

entries that ensure that the embedding is minimal w.r.t. to the overall DSRE. This is

obviously the average of the K nearest points for dimension i , i.e., the nearest neighbor

estimation f KNN ( x i ) , see Equation 2.

Figure 9 illustrates the embed-and-repair strategy for neighborhood size K =2 .

Pattern y ∗ =( y 1 ,

) is incomplete. It is embedded at the position where it leads to

the lowest DSRE w.r.t. the first dimension: between x ,and x . Then, the gap is filled

with the mean of the second dimension of y ,and y yielding ( y 1 , 0 . 5( y + y )) .To

summarize, embed-and-repair works as follows:

1. Choose y j from Y with minimal number of missing entries,

2. embed y j with UNN/ UNN g minimizing E −i ( X ) ,

3. add x j to X ,

4. complete y j w.r.t. KNN based on X (

→

y j ),

5. add y j to Y ,

6. start from 1 until Y \

Y empty.

The difference of KNN imputation, and embed-and-repair imputation is that the embed-

and-repair KNN prediction is based on neighborhoods in latent space. Hence, it is a di-

mensionality reduction-oriented imputation method based on characteristics introduced

by UNN regression.

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home