Graphics Reference
In-Depth Information
Once the clusters are stable, we take the examples with MVs. Now we use the dis-
tance D
(
x i ,
) =
(
x i ,
x j )
to find the nearest cluster C i to such instance.
From this cluster we compute the centroid x such that
S
min x j S x i = x j d
x ,
D
(
C i )<
D
(
x i ,
C i )
(4.61)
for all instances x i of the cluster C i . Once the centroid is obtained, the MV of the
example is imputed with the value of the proper attribute of x i .
4.5.7 Singular Value Decomposition Imputation (SVDI)
In this method, we employ singular value decomposition ( 4.62 ) to obtain a set of
mutually orthogonal expression patterns that can be linearly combined to approxi-
mate the values of all attributes in the data set [ 93 ]. These patterns, which in this
case are identical to the principle components of the data values' matrix, are further
referred to as eigenvalues.
U m × m Σ m × n V n × n .
A m × m =
(4.62)
Matrix V T now contains eigenvalues, whose contribution to the expression in the
eigenspace is quantified by corresponding eigenvalues on the diagonal of matrix
.
We then identify the most significant eigenvalues by sorting the eigenvalues based on
their corresponding eigenvalue. Although it has been shown that several significant
eigenvalues are sufficient to describe most of the expression data, the exact fraction
of eigenvalues best for estimation needs to be determined empirically.
Once k most significant eigenvalues from V T are selected, we estimate a MV j in
example i by first regressing this attribute value against the k eigenvalues and then
use the coefficients of the regression to reconstruct j from a linear combination of the
k eigenvalues. The j th value of example i and the j th values of the k eigenvalues are
not used in determining these regression coefficients. It should be noted that SVD
can only be performed on complete matrices; therefore we originally substitute row
average for all MVs in matrix A, obtaining A . We then utilize an Regularized EM
method to arrive at the final estimate, as follows. EachMV in A is estimated using the
above algorithm, and then the procedure is repeated on the newly obtained matrix,
until the total change in the matrix falls below the empirically determined (by the
authors [ 93 ]) threshold of 0
Σ
01 (noted as stagnation tolerance in the EM algorithm).
The other parameters of the EM algorithm are the same for both algorithms.
.
4.5.8 Local Least Squares Imputation (LLSI)
In this method proposed in [ 49 ] a target instance that has MVs is represented as a
linear combination of similar instances. Rather than using all available instances in
the data, only similar instances based on a similarity measure are used, and for that
 
Search WWH ::




Custom Search