Biology Reference
In-Depth Information
The gradient of
d
(
x
,
c
)=
(
x
−
c
)
,Q
(
x
−
c
)
1
/
2
with respect to
c
is
∇
c
(
x
−
c
)
,Q
(
x
−
c
)
1
/
2
=
Proof.
)
(
x
−
c
)
,Q
(
x
−
c
)
1
/
2
Q
(
x
−
c
−
(2.27)
Q
(
x
−
c
)
d
(
x
,
c
)
,
=
−
assuming
x
=
c
. Therefore if
c
1
,
c
2
do not coincide with any of the data points
x
i
,wehave
N
p
k
(
x
i
)
2
q
k
(
x
i
−
c
k
)
d
k
(
x
i
,
c
k
)
∇
c
k
f
(
c
1
,
c
2
)=
−
Q
k
.
(2.28)
i
=1
Setting the gradient equal to zero, “cancelling” the matrix
Q
k
and the common
factor
q
k
, and summing like terms, we get
N
x
i
=
N
p
k
(
x
i
)
2
d
k
(
x
i
,
c
k
)
p
k
(
x
i
)
2
d
k
(
x
i
,
c
k
)
c
k
,
i
=1
i
=1
proving (2.24) and (2.26). Substituting (2.7) in (2.26) then gives (2.25).
Note
: The theorem holds also if a center coincides with a data point, if we interpret
∞
as 1 in (2.24).
Theorem 2.2 applies, in particular, to the Mahalanobis distance (2.4)
d
(
x
,
c
k
)=
(
x
−
c
k
)
T
Σ
−
k
(
x
−
c
k
)
,
where Σ
k
is the (given or computed) covariance matrix of the cluster
/
∞
C
k
.
For the general case of
K
clusters it is convenient to use the probabilistic form
(2.26).
Corollary 2.1.
Consider a function of
K
centers
d
k
(
x
i
,
c
k
)
p
k
(
x
i
)
2
q
k
,
K
N
f
(
c
1
,
c
2
,...,
c
K
)=
(2.29)
k
=1
i
=1
an analog of (2.21). Then, under the hypotheses of Theorem 2.2, the minimizers
of
f
are
u
k
(
x
i
)
N
N
x
i
,
with
u
k
(
x
i
)=
p
k
(
x
i
)
2
c
k
=
d
k
(
x
i
,
c
k
)
,
(2.30)
t
=1
u
k
(
x
t
)
i
=1
for
k
=1
,...,K.
Proof.
Same as the proof of Theorem 2.2.
Search WWH ::
Custom Search