Graphics Reference
In-Depth Information
expressed as:
n
P
(
) =
(
x mj |
x m k ( j ) ),
<
(
)<
x
P
0
k
j
1
(4.48)
j
=
1
where (1) the index set m 1 ,
m 2 ,...,
m n is a permutation of the integer set 1
,
2
,...,
n ,
(2) the ordered pairs x mj ,
x m k ( j ) are chosen so that they the set of branches of a spanning
tree defined on X with their summed MI maximized, and (3) P
.
The probability defined above is known to be the best second-order approximation of
the high-order probability distribution. Then corresponding to each x in the ensemble,
a probability P
(
x m 1 |
x m 0 ) =
P
(
x m 1 )
can be estimated.
In general, it is more likely for samples of relatively high probability to form
clusters. By introducing the mean probability below, samples can be divided into
two subsets: those above the mean and those below. Samples above the mean will
be considered first for cluster initiation.
Let S
(
x
)
=
x . The mean probability is defined as
μ s =
P
(
x
)/ |
S
|
(4.49)
x
S
|
|
where
is the number of samples in S . For more details in the probability estimation
with dependence tree product approximation , please refer to [ 13 ].
When distance is considered for cluster initiation, we can use the following criteria
in assigning a sample x toacluster.
S
D
1. If there existsmore than one cluster, say C k |
k
=
1
,
2
,...
, such that D
(
x
,
C k )
for all k , then all these clusters can be merged together.
2. If exactly one cluster C k exists, such that D
D , then x can be grouped
(
x
,
C k )
into C k .
3. If D
D for all clusters C k , then x may not belong to any cluster.
To avoid including distance calculation of outliers, we use a simple method suggested
in [ 99 ] which assigns D the maximum value of all nearest-neighbor distances in L
provided there is a sample in L having a nearest-neighbor distance value of D
(
x
,
C K )>
1
(with the distance values rounded to the nearest integer value).
After finding the initial clusters along with their membership, the regrouping
process is thus essentially an inference process for estimating the cluster label of
a sample. Let C
a cq be the set of labels for all possible clusters to
which x can be assigned. For X k in X , we can form a contingency table between X k
and C .Let a ks and a cj be possible outcomes of X k and C respectively, and let obs
=
a c 1 ,
a c 2 ,...,
a ks
and obsa cj be the respectively marginal frequencies of their observed occurrences.
The expected relative frequency of
(
(
a ks ,
a cj )
is expressed as:
(
a ks ) ×
(
a cj )
obs
obs
exp
(
a ks ,
a cj ) =
(4.50)
|
S
|
 
Search WWH ::




Custom Search