How Engines Learn to Generate Recommendations: Adaptive Learning Algorithms - Realtime Data Mining

Database Reference

In-Depth Information

The question of the best value of s C is difficult and a subject of forthcoming

investigations and will not be addressed further here.

In closing let us turn our attention to a further special problem in the conditional

version. If a rule is no longer applied for recommendations after exceeding the

threshold value n min (because in the meantime other rules have become preferred),

it has - at least in the simplified version ( 5.26 ) - in general little chance to be

applied again, since the conditional probability p ss 0 is no longer being updated. This

holds even if its potential acceptance has increased again.

In order to get around this, we introduce a special explorative delivery mode for

the DP algorithm. For this, similarly to the

ε DP is

specified, in which instead of being delivered according to the action-value function

q π s

ε

-greedy policy, a percentage rate

ð , the recommendations are delivered in descending order according to the

following criterion:

;

r ss a ¼Δ

ðÞ¼p ss a p ss a

p a r a :

Θ

s

;

ð 5

:

27 Þ

Thus, the idea is that the difference between the unconditional probability p ss a

and the conditional probability p ss a is a good indicator for whether a rule has become

more attractive again. For if the difference increases, the user will be more inclined

toward product s a even without a recommendation, and the necessity of its delivery

increases.

Let us emphasize that the empirical approach of this section just presents some

very first and simple approaches to handle the crucial problem of statistical stability

of the DP-Version. Surely, much more advanced instruments can be developed.

Despite this, in Chaps. 6 , 7 , 8 , 9 , and 10 we will develop mathematically

more demanding methods to increase the stability of our RL approach for

recommendations.

That concludes our trip around the basic RL methods for our RE framework.

Let us now consider their experimental evaluation.

5.4 Experimental Results

In this section we will present experimental results for the approaches of Sects. 5.2

and 5.3 . Therefore, we will first verify the central Assumption 5.2 experimentally.

After that we extend the simulation of Sect. 4.4 in such a way that we first model the

environment, i.e., the conditional transition probabilities and rewards. Then, for the

actual simulation, we will use the environment model in order to generate an

arbitrary number of virtual sessions for testing the recommendation algorithms

under conditions close to reality. At the end of this section, we will use the extended

simulation for testing the algorithms introduced in this chapter.

Realtime Data Mining

Search WWH ::

Custom Search

Home