The Big Picture: Toward a Synthesis of RL and Adaptive Tensor Factorization - Realtime Data Mining

Database Reference

In-Depth Information

F 1

P P

:

58

The k -means algorithm ( k ¼ 2) applied to this problem automatically finds the

first partition G 1 ¼ {1,2}, G 2 ¼ {3} which is in fact the best one.

If we, by the way, use the global transition probabilities P for all i , i.e., P ðÞ ¼ P ,

this results, as expected, in a higher approximation error of approximately 2.07.

■

10.3 Estimation of Factorized Transition Probabilities

In what follows, we shall present a procedure to estimate the factorized transition

probabilities in an adaptive online fashion, that is, a method doing without any

representation of an estimate of the full transition probability tensor.

To this end, recall the update rule for the transition probabilities of a classical

MDP presented in Chap. 3 . In virtue of state space augmentation, this rule and the

corresponding convergence result carry over to the k -MDP case as follows:

P s ss 0 þ t 1

:¼ 1 t 1

P s ss 0

:

In tensor notation, the update rule reads as

P þ t 1 e s e s e s 0 ,

:¼ 1 t 1

P

where

1,

i ¼ j ,

eðÞ j ¼

0,

i

6¼ j

:

Matricifying with respect to the first mode and pre-multiplying with U + eventu-

ally gives rise to the update rule

C 1

þ t 1 e β ðÞ

:¼ 1 t 1

C

G β ðÞ e s e s 0 ,

ð 10

:

7 Þ

where

β

(s) denotes the unique index

β ∈

m satisfying s

G β , for the core tensor. In

∈

index notation, this corresponds to

C ss 0 β ðÞ þ t 1

C ss 0 β ðÞ :¼ 1 t 1

:

ð 10

:

8 Þ

Co nvergence of the update rule follows immediately from convergence of that

for P and partial continuity of the multilinear product.

Besides its computational use, the update rule also reveals another interesting

property of the factorization model: for the trivial partition, that is, the partition

Realtime Data Mining

Search WWH ::

Custom Search

Home