Graphics Reference
In-Depth Information
E
c
|−
E
k
|−
(
|
)(
|
)
where
df
is the corresponding degree of freedomhaving the value
.
A chi-squared test is then used to select interdependent variables in
X
at a presumed
significant level.
The cluster regrouping process uses an information measure to regroup data itera-
tively. Wong et al. have proposed an informationmeasure called
normalized surprisal
(NS) to indicate significance of joint information. Using this measure, the informa-
tion conditioned by an observed event
x
k
is weighted according to
R
1
1
X
k
,
C
K
, their
measure of interdependency with the cluster label variable. Therefore, the higher the
interdependency of a conditioning event, the more relevant the event is. NS measures
the joint information of a hypothesized value based on the selected set of significant
components. It is defined as
(
)
x
(
I
(
a
cj
|
a
cj
))
x
(
NS
(
a
cj
|
a
cj
))
=
m
k
=
1
R
)
(4.58)
X
k
,
C
k
(
x
(
where
I
is the summation of theweighted conditional information defined
on the incomplete probability distribution scheme as
(
a
cj
|
a
cj
))
m
x
(
X
k
,
C
k
I
(
a
cj
|
a
cj
))
=
R
(
)
I
(
a
cj
|
x
k
))
k
=
1
m
P
(
a
cj
|
x
k
)
X
k
,
C
k
a
cu
∈
E
c
=
R
(
)
−
log
(4.59)
P
(
a
cu
|
x
k
)
k
=
1
In rendering a meaningful calculation in the incomplete probability scheme formu-
lation,
x
k
is selected if
P
(
a
cu
|
x
k
)>
T
(4.60)
E
c
a
cu
∈
where
T
0 is a size threshold for meaningful estimation. NS can be used in a
decision rule in the regrouping process. Let
C
≥
={
a
c
1
,...,
a
cq
}
be the set of possible
cluster labels. We assign
a
cj
to
x
e
if
x
(
x
(
NS
(
a
cj
|
a
cj
))
=
min
a
cu
∈
NS
(
a
cu
|
a
cu
)).
C
If no component is selected with respect to all hypothesized cluster labels, or if
there is more than one label associated with the same minimum NS, then the sample
is assigned a dummy label, indicating that the estimated cluster label is still uncertain.
Also, zero probability may be encountered in the probability estimation, an unbiased
probability based on
Entropy minimax
. In the regrouping algorithm, the cluster label
for each sample is estimated iteratively until a stable set of label assignments is
attained.