Biomedical Engineering Reference
In-Depth Information
With an average of 250 possible bout durations per stage, this process yields a feature
vector of length approximately 1250 for each data instance. The dimensionality of this
raw feature space therefore exceeds the number of available instances by a factor of
approximately 6 , which leads to sparsely populated feature vectors.
Bout Duration Quantile Features. Selected features of the duration distributions were
used to reduce the dimensionality of the data representation. Specifically, only selected
quantiles (e.g., quartiles, deciles) were used to describe each stage. For a given integer
q
2 , and each stage X ,the q -quantiles are defined as follows. Let i be an integer in
the range i =1 , 2 ,
···
q
1 .The i -th q -quantile X.Q ( q ) i of stage X is:
1
q i
X.Q ( q ) i =argmin
d
{
F X ( d )
}
,
where F X is the CDF of stage X . In words, the value of X.Q ( q ) i for a given set of
hypnograms is the smallest d for which at least a fraction i/q of the stage X bouts in
the input set have a duration of d or less. As an illustration in the case q =4 ,theCDF
of NREM stage 2 bout durations for the entire set of 244 hypnograms is shown in Fig. 2
(right), together with the compressed quartile representation, visualized as a piecewise
constant approximation with jumps at the quartile durations.
Selection of the Number of Quantiles. As Fig. 2 (right) suggests, the CDF approxi-
mation error decreases as the number of quantiles, q , increases. However, the variance
of the quantile estimates themselves will increase, as the number of samples available
per quantile decreases with increasing q . Thus, one expects that there may be an opti-
mal range of values for q . Experiments were therefore performed to determine how the
results of the clustering technique (section 2.3) depend on q .
The mean Rand index stability value (see section 2.3) was observed to attain a maxi-
mum value at or near q =4 , for a number of clusters between 2 and 5 .Thevalue q =4
is appealing because quartiles are easily understood. Hence, 4 quantiles were used in all
subsequent work reported in the present paper. The three bout duration quartile values
X.Q (4) 1 ,X.Q (4) 2 ,X.Q (4) 3 were used to describe each of the five stages, X (wake,
N1, N2, SWS, REM), yielding a 15 -dimensional feature vector for each instance. This
data representation was used for the clustering analysis (section 2.3).
2.3 Clustering
Clustering was applied to the 15 -dimensional feature vectors (section 2.2) to seek ob-
jectively defined groups of hypnograms with distinct bout duration characteristics.
Clustering Technique. The technique of Expectation-Maximization (EM) clustering
was selected after an experimental comparison with k -means clustering showed higher
stability of the EM clustering results with respect to pseudorandom initial parameter
variation (see section 3.1). EM performs iterative maximum likelihood estimation of
the cluster parameters [12,26]. Clustering experiments were carried out using the Weka
data mining toolkit [16]. A mixture of Gaussians is used as the cluster model, and initial
parameter values are found through k -means clustering.
Search WWH ::




Custom Search