Building a Recommendation Engine: The XELOPES Library - Realtime Data Mining

Database Reference

In-Depth Information

the Watkins Q -learning, off-policy algorithm and WatkinsQLambdaAgent for the

Watkins Q (

), and off-policy algorithm. Again the names of all parameters are

consistent to [SB98].

λ

Example 12.22 We give an example of a modified GridWorld example

representing an episodic task (in contrast to our previous GridWorld, which was a

continuing task). This new GridWorld has a terminal state after which the episode

terminates. Here the reward is 1 for all transitions; thus, we want to reach the

terminal state as fast as possible. Like in the previous example, we omit the

implementation of the GridEnvironment but focus on the solution process.

// Create agent settings:

TDAgentSettings agentSettings ¼ new TDAgentSettings();

agentSettings.setInputDataSpecification(metaData);

agentSettings.setGamma(1.0);

agentSettings.setAlpha(0.01);

agentSettings.setLambda(0.9);

agentSettings.verifySettings();

// Get default agent specification from 'agents.xml':

AgentSpecification agentSpecification ¼

AgentSpecification.getAgentSpecification("SarsaLambda

Agent" );

// Create algorithm object with default values:

RLAgent agent ¼ (RLAgent) agentSpecification.createAgen-

tInstance();

// Put it all together:

agent.setAgentSettings(agentSettings);

agent.verify();

// Create environment:

Environment env ¼ new GridEnvironment();

// Create and init simulation object:

Simulation sim ¼ new Simulation(agent, env);

sim.init(null); // assigns environment to agent

// Run simulation:

int numTrials ¼ 10000;

int maxStepsPerTrial ¼ 100;

sim.setTrialDevisor(1000);

sim.trials(numTrials, maxStepsPerTrial);

System.out.println("total time [s]: " + sim.getTimeSpent

ToRunTrials() );

■

Search WWH ::

Custom Search

Home