Information Technology Reference
In-Depth Information
Then, we count the number N r ( i )of m -dimension vectors y ( j ) within a distance
r of a given vector y ( i ), and compute C r
( i )astheratioof N r
( i ) to the total
number of m -dimension vectors in the time series:
N r
( i )
C r
( i )=
(2)
N
m +1
That is, C r
( i ) approximates to the probability of finding any m -dimension vec-
tor y ( j ) similar to the vector y ( i ) within a tolerance factor r . Then, the loga-
rithmic average over i of the C r
( i ) probability is defined as:
= N−m +1
ln C r
( i )
C r
i =1
(3)
N
m +1
The approximate entropy ApEn ( m, r ) is then defined as ApEn ( m, r )= C r
C m +1
r
.The ApEn ( m, r ) value is related to the probability that sequences that
are similar for m samples remain similar for m + 1 samples. A high value of
ApEn ( m, r ) means that we can not accurately predict the next sample from
the knowledge of the previous m samples. In practice, m = 2 is a good value
for short time series where high unpredictability is expected, and the tolerance
factor r is selected as one or two times the standard deviation of the time series.
3.2 LZ Complexity
The Lempel-Ziv [8] complexity measure ( LZ ) assess the regularity and random-
ness in a symbol sequence. It compares the length of the original sequence ( n )
with the shorter length of the compressed sequence that can be obtained when
repetitive subsequences are coded by means of a single index. A small value
for the LZ complexity parameter γ indicates that highly repetitive non-random
patterns are present in the sequence.
The basic procedure is as follows [9]. First, the signal is reduced to a se-
quence of n binary symbols. Then the sequence is parsed from left to right in
order to construct a new compressed sequence by applying two basic operations,
copy and insert . During this process, a set of words (or dictionary) that con-
tains all the subsequences previously found in the already parsed segment of
the original sequence is built. The insert action simply adds the present symbol
to the compressed sequence. The copy action is applied when the present and
preceding symbols form a pattern already coded in the dictionary. In this case,
a dot is placed in the compressed sequence. When the parsing reaches the end of
the given original sequence, a shorter compressed sequence has been generated.
The length of this compressed sequence depends on the presence of repetitive
subsequences, and it is smaller for highly redundant sequences.
At the end of this procedure, the Kolmogorov complexity c ( n ) is computed
as the length of the shortest compressed sequence that can be generated by the
previously described procedure and it is associated to a kind of minimal length
description. This value has to be normalized by b ( n )= n/log 2 n ,inorderto
enforce the Shannon entropy of the given sequence to be 1. The LZ complexity
index is then defined as γ = c ( n ) /b ( n ).
 
Search WWH ::




Custom Search