Biology Reference
In-Depth Information
The various existing rules of thumb generally lead to the use of values of
m of 1 or 2 for data records of length N ranging from 100 to 5000 data
points and values of r between 0.1 and 0.25 to determine the tolerance. In
general, the accuracy and confidence of the entropy estimate improve as
the numbers of matches of length m and m
þ
1 increase. By this token,
the number of matches can be increased by choosing a small m (short
templates) and a larger r (wide tolerance). There are penalties, however,
for criteria that are too relaxed: as r increases, the probability of matches
tends toward 1, and SampEn tends to 0 for all processes, thereby
reducing the ability to distinguish any salient features in the data set;
and as m decreases, underlying physical processes that are not optimally
apparent at smaller values of m may be obscured.
This being said, in most current applications the parameter values of
choice are m
0.2*SD, which means we are counting templates
with a length of 2 to calculate B and templates with a length of 3 to
calculate A, and the tolerance for matches is set to 0.2 times the SD of the
process. In most cases, all readings in the observed sample are first
divided by the SD of that sample, so the SD of the sample becomes
exactly SD
¼
2 and t
¼
0.2. This preprocessing of the data
eliminates the influence of the variance of the sample on the irregularity
(or complexity) of the process, thus leaving SampEn to pick up only
characteristics strictly related to the sequential timing of the observations
and generally independent from the distribution of the observations.
More details, including the strict definition of SampEn, can be found in
Lake et al. (2002).
¼
1, in which case t
¼
r
¼
Although the computation of SampEn for long time series certainly
requires appropriate software (see Internet Resources at the end of the
chapter), one simple numerical example using the short sequences
considered earlier should clarify the template counting algorithm.
Example 6-1
.......................
For m
0.2, calculate the SampEn and SD for the sequences S1:
1,0,1,0,1,0,1,0,1,0 and S2: 1,1,1,0,0,1,0,0,0,1, and compare the results.
¼
2 and r
¼
We begin with the periodic series S1. The SD of this sample is 0.527.
Thus, for r
¼
0.2, the tolerance for similarity between two templates
would be t
2, all subsequences of
length 2 (beginning at up to N-m) in the series are 10,01,10,01,10,01,10,01.
Given a similarity tolerance of t
¼
(0.2)(0.527)
¼
0.1054. With m
¼
0.1054, two subsequences would
be matches only if they are identical. Thus, the total number of
template matches of length m
¼
¼
2isB
¼
3
þ
3
þ
3
þ
3
þ
3
þ
3
þ
3
þ
3
¼
24
(each template 10 or 01 has exactly three matches, excluding self-
matches). All subsequences of length m
3 in the above series are
101,010,101,010,101,010,101,010. Thus, the total number of template
þ
1
¼
Search WWH ::




Custom Search