Building a Recommendation Engine: The XELOPES Library - Realtime Data Mining

Database Reference

In-Depth Information

returns an action which in turn is passed to the environment. If it receives a terminal

state from the environment, it returns a null action that causes the environment to

start a new episode. In this way, the interaction of Fig. 3.1 is supported by RLAgent

and its associated Environment. The actual simulation is executed by the Simulation

class which finally presents some statistics.

12.2.2.2 RL Algorithm Packages

DP Package

The dynamic programming algorithms are organized in the DP package. It contains

an own environment class DPEnvironment which extends Environment from the

RL Core package. The central method of DPEnvironment is getEnvironmentModel

which returns the model object of the environment which is an instance of

EnvironmentModel.

EnvironmentModel contains two methods getTransProb and getTransRew to

return the transition probabilities p ss 0 and -rewards r ss 0 , respectively. Both are

modeled by the interface TransitionFunction which represents the three-

dimensional tensor of transition values from state s to state s 0 under action a.

The abstract class DPAgent extends RLAgent , and from its assigned

DPEnvironment , it takes the model of the environment. Since DPAgent learns in

offline mode, it has a similar method as MiningAlgorithm from the data mining

framework to run the learning, buildModel , that solves the Bellman equation ( 3.7 ).

Only after this method has been called, the policy of the DPAgent can be used.

The policy of DPAgent is always a greedy policy and hence an instance of

GreedyPolicy class.

The classes PolicyIterationAgent and ValueIterationAgent both extend DPAgent

for the policy iteration and value iteration algorithms explained in Sect. 3.9.4 . They

have only few parameters, and in most cases the user has not to care about them.

Example 12.21 We show the example that solves the GridWorld problem of

[SB98]. (Notice that the main implementation amount requires the environment

class GridJumpEnvironment not listed here.)

// Create agent settings:

RLAgentSettings agentSettings ¼ new RLAgentSettings();

agentSettings.setInputDataSpecification(metaData);

agentSettings.setGamma(0.9);

agentSettings.verifySettings();

// Get default agent specification from 'agents.xml':

AgentSpecification agentSpecification ¼

AgentSpecification.getAgentSpecification( "PolicyItera-

tionAgent" );

Search WWH ::

Custom Search

Home