Biology Reference
In-Depth Information
D i denote the i th random permutation of
fixed clustering method
D 0 , ob-
tained as described above, and let S i ( k ) denote the value of S computed for the k -
cluster partitioning of
M
.Let
D i , obtained by the same clustering method
M
.Thework-
ing hypothesis here is that, if
D 0 exhibits k well-defined clusters, S 0 ( k ) should
be significantly larger than S i ( k ) for all random permutations. One measure of
significance is the (one-sided) empirical probability: if S 0 ( k ) is larger than all but
q
1 of the m permutation values S i ( k ), the empirical probability of observing
S 0 ( k ) by random chance is less than q/m . The other measure of significance used
here is the z -score, computed from the mean S k and standard deviation σ S ( k ) of
the random permutation values
{
S i }
:
S k
z k = S 0 ( k )
.
(15.10)
σ S ( k )
Experience with simulation datasets having known cluster structures has
shown that correct clusterings generally lead to significant results with respect to
the empirical probability estimates (e.g., S 0 ( k ) exceeds all of the randomized val-
ues S i ( k )) and maximal or near maximal with respect to both z k and S 0 ( k ) over
the range of k considered [16]. Experience has also shown that in cases where
no cluster structure exists (e.g., simulation datasets constructed from statistically
independent random data vectors), none of the S 0 ( k ) values typically meet these
significance criteria.
15.6.3. Summary of the Results
The clustering procedure described in Sec. 15.6.2 could be applied directly to
the attribute matrix defined by the 11 variables listed in Table 15.2, but it has
been shown using simulated datasets that the inclusion of extraneous variables
can degrade clustering results badly [7, 16]. Thus, the approach taken here starts
with the smallest interesting subset of these variables and proceeds in a manner
analogous to stepwise regression, including each variable only if it improves the
clustering. Since the original question motivating this work was the nature of
the relationship between the subjective association measure S ab and the objective
measure U ab , the smallest subset considered here includes variables V 1 (the mean
S ab value over the 100 adverse events considered) and V 3 (the correlation between
S ab and U ab ).
Fig. 15.6 shows the k -cluster partition results for k from 2 through 10,com-
puted from this minimal variable set. The solid circles correspond to S 0 ( k ) and
the boxplots describe the range of m = 100 random permutation results
.
Since none of these S 0 ( k ) values fall outside the ranges of the random permuta-
{
S i ( k )
}
Search WWH ::




Custom Search