Building a Recommendation Engine: The XELOPES Library - Realtime Data Mining

Database Reference

In-Depth Information

// Apply policy 10 times:

for (int i ¼ 0; i < 10; i++)

System.out.println("next action: " + egp.nextAction());

// Result, e.g.:

// next action: action: 2.0 index ¼ 1

// next action: action: 1.0 index ¼ 0

// next action: action: 2.0 index ¼ 1

// next action: action: 3.0 index ¼ 2

// next action: action: 2.0 index ¼ 1

...

■

Agent, Environment

The central class of the RL package is, of course, RLAgent . It extends the general

Agent from Sect. 12.2.1 . The generic of RLAgent is Action because its apply and

learnApply methods return Action objects. Unlike the general agent framework of

XELOPES, the RL package contains a base Environment class. Environment is an

abstract class that extends the EnvironmentInformation (Sect. 12.2.1 ) and imple-

ments StateActionSet . So the complete interaction of Fig. 3.1 can be modeled by

the RL package.

RLAgent has an associated settings class RLAgentSettings that extends the

general AgentSettings from Sect. 12.2.1 . It stores some basic parameters like the

discount rate

and contains a description of the agent's metadata.

Further, RLAgent contains variables vfunction for the state-value function,

qfunction for the action-value function, and policy for the policy of the agent (not

all must be used). Further, it has a reference to its Environment . For the case where

the agent knows its environment model (i.e., transition probabilities and -rewards),

the variable envModel of RLAgent can be used. It is of the class EnvironmentModel

which contains interfaces to access the transition probabilities and -rewards.

The RL package also supports simulations in the spirit of Fig. 3.1 . The approach

was motivated by the RL implementation of Sutton and Santamaria [StSa96]. To

this end, the following method is contained in Environment :

γ

public abstract StateRewardVector step(Action action)

throws MiningException;

This method will be called once by the simulation instance in each step of the

simulation. step causes the environment to undergo a transition from its current

state to a next state dependent on the action . The method returns the next state and

reward as StateRewardVector object. If action is null, a new episode starts.

The learnApply method of RLAgent , inherited from Agent , with a

StateRewardVector object as argument serves as counterpart to the step method

from the agent side. It takes the next state and reward from the environment and

Search WWH ::

Custom Search

Home