How Engines Learn to Generate Recommendations: Adaptive Learning Algorithms - Realtime Data Mining

Database Reference

In-Depth Information

and its inverse through

p a :

p ½ ¼ F 1

Π a

ð 5

15 Þ

The inverse mapping requires the solution of an equation system which i s not

difficult to compute. Although ( 5.14 ) is formally nonlinear with respect to p ½ ,in

reality the solution can be done in a quite similar way. In fact, first we consider all

recommended successor products s 0

∈

S a and calculate their corresponding condi-

tional probabilities p a s 0

ss 0 . This requires the solution of a linear equation system with

Sjj unknowns. Then we turn to the remaining, not-recommended successor states

s 0 2 S a and directly calculate their unconditional probabilities p ss 0 .

That way, we end up at Algorithm 5.2 for multiple recommendations which is

quite similar to Algorithm 5.1 of single recommendations.

Algorithm 5.2: Update of the internal from conditional probabilities for

multiple recommendations, linear mapping

Input: vector of in te rnal probabilities j p ½ and fixed probabilities j

Π a , delivered

recommendations a ¼ a 1 ,

...

, a k

Þ , index of product tra n sition l , step size

α j

Output: updated vector of internal probabilities jþ 1 p ½ and jþ 1

Π a

1: proc e dure UPD A TE_P_DP_MULTI_LIN( j p ½ , j

Π a , a , l ,

α j )

j p a

j p ½

¼ F j

⊳

conversion into conditional

probabilities

Π a

jþ 1 p a : ¼ U PD ATE_P_

SIN GL E( j p a , l ,

α j )

⊳

update of conditional probabilities

jþ 1 p ½

:¼ F 1

jþ 1 p a

⊳

conversion into internal probabilities

Π a

jþ 1

Π a

:¼

Π a

⊳ unchanged take-over of the fixed

component

return ( jþ 1 p ½ ,

jþ 1

Π a )

end procedure

We now turn to the action-value function. For complexity reasons we cannot

check all combinations of single recommendations in order to determine the greedy

policy. We need a more efficient approach.

At this, we plug in ( 4.3 ) into ( 3.5 ) and rewrite the action-value function for

multiple recommendations:

r ss 0 ¼

ðÞ¼ X

s 0

p ss 0 r ss 0 ¼ X

s 0

k X

q π s

p a i

ss 0

p a i

q π s

;

ss 0 r ss 0 ¼

að :

;

i¼ 1

s 0

i¼ 1

This nice result tells us that in order to find the highest action value of

k recommendations, we just need to select the k recommendations of the highest single

action values. The same holds for the full action-value function ( 3.6 ).

Realtime Data Mining

Search WWH ::

Custom Search

Home