Biomedical Engineering Reference
In-Depth Information
Fig. 2. Left: stage 2 bout duration PDF; right: CDF with quartiles as approximation
Measuring Clustering Stability. Since clustering parameters are initialized pseudo-
randomly, the results may vary across runs. It is therefore important to gauge the varia-
tion of clustering results for different starting conditions. Stability of clustering results
was assessed by comparing the clusters resulting from all pairs of 50 seed values for
agivenvalueof k . A measure of agreement of two clusterings based on the fraction
of pairs of instances that are grouped together in the same cluster by each of the two
clusterings, the adjusted Rand Index [17], was computed for all pairs of seed values.
This index has a maximum value of 1 , attained only for two identical clusterings. The
adjusted Rand index of a randomly selected pair of clusterings is 0 on average. As com-
pared with the standard Rand Index [29], the adjusted Rand Index is therefore much
stricter, as it accounts for the degree of matching expected by chance. Subsequent ex-
periments were performed with a clustering of maximum mean adjusted Rand Index.
2.4
Statistical Significance
Multiway and Pairwise Comparisons. When comparing means or medians of sev-
eral populations (e.g., clusters), ANOVA or a Kruskal-Wallis test are used. Likewise,
statistical significance of differences of means or medians between pairs of populations
is tested by using either a t -test or Wilcoxon rank sum test, respectively. ANOVA and
t -tests presuppose normality of the distribution of the means, a condition that may not
hold exactly in all cases. Nearly all of the comparisons performed in the present paper
involve populations with several dozen members, and the normality condition is satis-
fied approximately. In any case, the Kruskal-Wallis and Wilcoxon rank sum tests do not
presuppose normality, and provide additional confidence regarding statistical validity.
A two-sample Kolmogorov-Smirnov test is used to compare probabililty distributions
without any assumptions of a particular functional form, and without targeting any par-
ticular statistic such as the mean or median.
Correction for Increased Type I Error due to Multiple Comparisons. Several of
the results described are obtained through exploratory data analysis, involving the si-
multaneous testing of multiple statistical hypotheses. In any such situation, the risk of
a type I inference error - incorrectly rejecting a null hypothesis - increases due to the
accumulation of error over multiple comparisons. This issue is addressed in the present
paper using the method of [2]. Given n prospective individual findings with associated
 
Search WWH ::




Custom Search