Graphics Reference
In-Depth Information
the most common value among all neighbors is taken, and for numerical values the
average value is used. Therefore, a proximity measure between instances is needed
for it to be defined. The Euclidean distance (it is a case of a L p norm distance) is the
most commonly used in the literature.
In order to estimate a MV y ih in the i th example vector y i by KNNI [ 6 ], we first
select K examples whose attribute values are similar to y i . Next, the MV is estimated
as the average of the corresponding entries in the selected K expression vectors.
When there are other MVs in y i and/or y j , their treatment requires some heuristics.
The missing entry y ih is estimated as average:
j I Kih y jh
|
y
ih =
,
(4.26)
I Kih |
where I Kih is now the index set of KNN examples of the i th example, and if y jh
is missing the j th attribute is excluded from I Kih . Note that KNNI has no theoret-
ical criteria for selecting the best K-value and the K-value has to be determined
empirically.
4.5.2 Weighted Imputation with K-Nearest Neighbour (WKNNI)
The Weighted KNN method [ 93 ] selects the instances with similar values (in terms
of distance) to incomplete instance, so it can impute as KNNI does. However, the
estimated value now takes into account the different distances to the neighbors, using
a weighted mean or the most repeated value according to a similarity measure. The
similarity measure s i (
between two examples y i and y j is defined by the Euclidian
distance calculated over observed attributes in y i . Next we define the measure as
follows:
y j )
2
1
/
s i =
O i O j (
y ih
y jh )
,
(4.27)
h i
where O i ={
.
The missing entry y ih is estimated as average weighted by the similarity measure:
h
|
the h th component of y i is observed
}
j I Kih s i (
y j )
y jh
y ih =
j I Kih s i (
,
(4.28)
y j )
where I Kih is the index set of KNN examples of the i th example, and if y jh is missing
the j th attribute is excluded from I Kih . Note that KNNI has no theoretical criteria for
selecting the best K-value and the K-value has to be determined empirically.
 
 
Search WWH ::




Custom Search