Graphics Reference
In-Depth Information
a better description tool when the clusters are not well-separated, as is the case in
missing data imputation. Moreover, the original K-means clustering may be trapped
in a local minimum status if the initial points are not selected properly. However,
continuous membership values in fuzzy clustering make the resulting algorithms less
susceptible to get stuck in a local minimum situation.
In fuzzy clustering, each data object x i has a membership function which describes
the degree to which this data object belongs to certain cluster v k . The membership
function is defined in the next equation
x i ) 27 ( m 1 )
d
(
v k ,
U
(
v k ,
x i ) =
(4.29)
j = 1 d
x i ) 2 /( m 1 )
(
v j ,
1 is the fuzzifier, and j = 1 U
where m
>
(
v j ,
x i ) =
1 for any data object
x i (
. Now we can not simply compute the cluster centroids by the mean
values. Instead, we need to consider the membership degree of each data object.
Equation ( 4.30 ) provides the formula for cluster centroid computation:
1
i
N
)
i = 1 U
(
v k ,
x i ) ×
x i
v k =
(4.30)
i = 1 U
(
v k ,
x i )
Since there are unavailable data in incomplete objects, we use only reference
attributes to compute the cluster centroids.
The algorithm for missing data imputation with fuzzy K-means clustering method
also has three processes. Note that in the initialization process, we pick K centroids
which are evenly distributed to avoid local minimum situation. In the second process,
we iteratively update membership functions and centroids until the overall distance
meets the user-specified distance threshold
. In this process, we cannot assign the
data object to a concrete cluster represented by a cluster centroid (as did in the basic
K-mean clustering algorithm), because each data object belongs to all K clusters
with different membership degrees. Finally, we impute non-reference attributes for
each incomplete object. We replace non-reference attributes for each incomplete
data object x i based on the information about membership degrees and the values of
cluster centroids, as shown in next equation:
ε
K
x i , j =
U
(
x i ,
v k ) ×
v k , j ,
for any non-reference attribute j
R
(4.31)
k
=
1
4.5.5 Support Vector Machines Imputation (SVMI)
Support Vector Machines Imputation [ 29 ] is an SVM regression based algorithm
to fill in MVs, i.e. set the decision attributes (output or classes) as the condition
 
 
Search WWH ::




Custom Search