Graphics Reference
In-Depth Information
a better description tool when the clusters are not well-separated, as is the case in
missing data imputation. Moreover, the original K-means clustering may be trapped
in a local minimum status if the initial points are not selected properly. However,
continuous membership values in fuzzy clustering make the resulting algorithms less
susceptible to get stuck in a local minimum situation.
In fuzzy clustering, each data object
x
i
has a membership function which describes
the degree to which this data object belongs to certain cluster
v
k
. The membership
function is defined in the next equation
x
i
)
−
27
(
m
−
1
)
d
(
v
k
,
U
(
v
k
,
x
i
)
=
(4.29)
j
=
1
d
x
i
)
−
2
/(
m
−
1
)
(
v
j
,
1 is the fuzzifier, and
j
=
1
U
where
m
>
(
v
j
,
x
i
)
=
1 for any data object
x
i
(
. Now we can not simply compute the cluster centroids by the mean
values. Instead, we need to consider the membership degree of each data object.
Equation (
4.30
) provides the formula for cluster centroid computation:
1
≤
i
≤
N
)
i
=
1
U
(
v
k
,
x
i
)
×
x
i
v
k
=
(4.30)
i
=
1
U
(
v
k
,
x
i
)
Since there are unavailable data in incomplete objects, we use only reference
attributes to compute the cluster centroids.
The algorithm for missing data imputation with fuzzy K-means clustering method
also has three processes. Note that in the initialization process, we pick
K
centroids
which are evenly distributed to avoid local minimum situation. In the second process,
we iteratively update membership functions and centroids until the overall distance
meets the user-specified distance threshold
. In this process, we cannot assign the
data object to a concrete cluster represented by a cluster centroid (as did in the basic
K-mean clustering algorithm), because each data object belongs to all
K
clusters
with different membership degrees. Finally, we impute non-reference attributes for
each incomplete object. We replace non-reference attributes for each incomplete
data object
x
i
based on the information about membership degrees and the values of
cluster centroids, as shown in next equation:
ε
K
x
i
,
j
=
U
(
x
i
,
v
k
)
×
v
k
,
j
,
for any non-reference attribute
j
∈
R
(4.31)
k
=
1
4.5.5 Support Vector Machines Imputation (SVMI)
Support Vector Machines Imputation [
29
] is an SVM regression based algorithm
to fill in MVs, i.e. set the decision attributes (output or classes) as the condition