Information Technology Reference
In-Depth Information
15.2 Methodology
15.2.1 Transfer Entropy (
TE
)
Consider a
k
th order Markov process [10] described by
(
|
,
, ··· ,
x
n
−
k
+
1
)=
P
x
n
+
1
x
n
x
n
−
1
(15.1)
(
|
,
, ··· ,
x
n
−
k
)
,
P
x
n
+
1
x
n
x
n
−
1
where
P
represents the conditional probability of state
x
n
+
1
of a random process
X
at time
n
1. Equation (15.1) implies that the probability of occurrence of a particu-
lar state
x
n
+
1
depends only on the past
k
states
+
x
(
k
n
of the system.
The definition given in Equation (15.1) can be extended to the case of Markov in-
terdependence of two random processes X and Y as
[
x
n
, ··· ,
x
n
−
k
+
1
]
≡
x
(
k
)
x
(
k
)
y
(
l
)
P
(
x
n
+
1
|
)=
P
(
x
n
+
1
|
(
,
))
,
(15.2)
n
n
n
where
x
(
k
)
n
are the past
k
states of the first random process
X
and
y
(
l
n
are the past
l
states of the second random process
Y
. This generalized Markov property implies
that the state
x
n
+
1
of the process
X
depends only on the past
k
states of the process
X
and not on the past
l
states of the process
Y
. However, if the process
X
also depends
on the past states (values) of process
Y
, the divergence of the hypothesized transition
probability
P
x
(
k
)
(L.H.S. of Equation (15.2)), from the true underlying tran-
sition probability of the system
P
(
x
n
+
1
|
)
n
x
(
k
)
y
(
l
)
(R.H.S of Equation (15.2)), can
be quantified using the Kullback-Leibler measure [11]. Then, the Kullback-Leibler
measure quantifies the transfer of entropy from the driving process
Y
to the driven
process
X
, and if it is denoted by TE(Y
(
x
n
+
1
|
(
,
))
n
n
→
X), we have
x
(
k
)
y
(
l
)
N
n
=
1
P
(
x
n
+
1
,
x
(
k
)
log
2
P
(
x
n
+
1
|
,
)
y
(
l
)
n
n
TE
(
Y
→
X
)=
,
)
.
(15.3)
n
n
x
(
k
)
P
(
x
n
+
1
|
)
n
The values of the parameters
k
and
l
are the orders of the Markov process for the two
coupled processes
X
and
Y
, respectively. The value of
N
denotes the total number
of the available points per process in the state space.
In search of optimal
k
, it would generally be desirable to choose the parameter
k
as large as possible in order to find an invariant value (e.g., for conditional entropies
to converge as
k
increases), but in practice the finite size of any real data set im-
poses the need to find a reasonable compromise between finite sample effects and
approximation of the actual value of probabilities. Therefore, the selection of
k
and
l
plays a critical role in obtaining reliable values for the transfer of entropy from real
data. The estimation of TE as suggested in [22] also depends on the neighborhood
size (radius
r
) used in the state space for the calculation of the involved joint and
conditional probabilities. The value of radius
r
in the state space defines the max-
imum norm distance in the search for neighboring state space points. Intuitively,