Information Technology Reference
In-Depth Information
with f leading to the complete vector y j that can be embedded as usual, depending on
the employed approach (UNN, UNN g , etc.). To summarize, repair-and-embed works as
follows:
Y with minimal number of missing entries,
2. employ Y −i as training pattern for prediction of y ij , and predict with KNN regres-
sion (
1. Choose y j from Y \
y j ),
3. add y j to Y , and start from 1. until all patterns are complete,
4. embed all Y with UNN approach.
As KNN regression is a non-parametric method, no training is necessary, only K has to
be chosen carefully.
5.3
Embed-and-Repair
The second variant for embedding incomplete data is to embed a vector y j with missing
entries at dimension i ignoring the i -th dimension during the embedding (the computa-
tion of the DSRE), i.e., minimizing:
E −i ( X )= 1
F ,
N Y −i f UNN ( x ; X ) −i
(7)
The approach starts iteratively with complete vectors y from Y , and then incomplete
patterns with increasing number of missing values. Starting the dimensionality reduc-
tion with complete patterns is reasonable to get as close as possible to the structure of
the complete embedding. Embed-and-repair is a greedy approach that only considers
the locally best embedding w.r.t. the available information. As the embedded pattern
has to be completed to allow embeddings of further patterns, the gaps are closed with
entries that ensure that the embedding is minimal w.r.t. to the overall DSRE. This is
obviously the average of the K nearest points for dimension i , i.e., the nearest neighbor
estimation f KNN ( x i ) , see Equation 2.
Figure 9 illustrates the embed-and-repair strategy for neighborhood size K =2 .
Pattern y =( y 1 ,
) is incomplete. It is embedded at the position where it leads to
the lowest DSRE w.r.t. the first dimension: between x ,and x . Then, the gap is filled
with the mean of the second dimension of y ,and y yielding ( y 1 , 0 . 5( y + y )) .To
summarize, embed-and-repair works as follows:
·
1. Choose y j from Y with minimal number of missing entries,
2. embed y j with UNN/ UNN g minimizing E −i ( X ) ,
3. add x j to X ,
4. complete y j w.r.t. KNN based on X (
y j ),
5. add y j to Y ,
6. start from 1 until Y \
Y empty.
The difference of KNN imputation, and embed-and-repair imputation is that the embed-
and-repair KNN prediction is based on neighborhoods in latent space. Hence, it is a di-
mensionality reduction-oriented imputation method based on characteristics introduced
by UNN regression.
 
Search WWH ::




Custom Search