Database Reference
In-Depth Information
F
1
P P
:
:
58
The
k
-means algorithm (
k ¼
2) applied to this problem automatically finds the
first partition
G
1
¼
{1,2},
G
2
¼
{3} which is in fact the best one.
If we, by the way, use the global transition probabilities
P
for all
i
, i.e.,
P
ðÞ
¼ P
,
this results, as expected, in a higher approximation error of approximately 2.07.
■
10.3 Estimation of Factorized Transition Probabilities
In what follows, we shall present a procedure to estimate the factorized transition
probabilities in an adaptive online fashion, that is, a method doing without any
representation of an estimate of the full transition probability tensor.
To this end, recall the update rule for the transition probabilities of a classical
MDP presented in Chap.
3
. In virtue of state space augmentation, this rule and the
corresponding convergence result carry over to the
k
-MDP case as follows:
P
s
ss
0
þ t
1
:¼
1
t
1
P
s
ss
0
:
In tensor notation, the update rule reads as
P þ t
1
e
s
e
s
e
s
0
,
:¼
1
t
1
P
where
1,
i ¼ j
,
eðÞ
j
¼
0,
i
6¼ j
:
Matricifying with respect to the first mode and pre-multiplying with
U
+
eventu-
ally gives rise to the update rule
C
1
þ t
1
e
β ðÞ
:¼
1
t
1
C
G
β ðÞ
e
s
e
s
0
,
ð
10
:
7
Þ
where
β
(s) denotes the unique index
β
∈
m
satisfying s
G
β
, for the core tensor. In
∈
index notation, this corresponds to
C
ss
0
β ðÞ
þ t
1
C
ss
0
β ðÞ
:¼
1
t
1
:
ð
10
:
8
Þ
Co
nvergence of the update rule follows immediately from convergence of that
for
P
and partial continuity of the multilinear product.
Besides its computational use, the update rule also reveals another interesting
property of the factorization model: for the trivial partition, that is, the partition