Database Reference
In-Depth Information
the Watkins Q -learning, off-policy algorithm and WatkinsQLambdaAgent for the
Watkins Q (
), and off-policy algorithm. Again the names of all parameters are
consistent to [SB98].
λ
Example 12.22 We give an example of a modified GridWorld example
representing an episodic task (in contrast to our previous GridWorld, which was a
continuing task). This new GridWorld has a terminal state after which the episode
terminates. Here the reward is 1 for all transitions; thus, we want to reach the
terminal state as fast as possible. Like in the previous example, we omit the
implementation of the GridEnvironment but focus on the solution process.
// Create agent settings:
TDAgentSettings agentSettings ¼ new TDAgentSettings();
agentSettings.setInputDataSpecification(metaData);
agentSettings.setGamma(1.0);
agentSettings.setAlpha(0.01);
agentSettings.setLambda(0.9);
agentSettings.verifySettings();
// Get default agent specification from 'agents.xml':
AgentSpecification agentSpecification ¼
AgentSpecification.getAgentSpecification("SarsaLambda
Agent" );
// Create algorithm object with default values:
RLAgent agent ¼ (RLAgent) agentSpecification.createAgen-
tInstance();
// Put it all together:
agent.setAgentSettings(agentSettings);
agent.verify();
// Create environment:
Environment env ¼ new GridEnvironment();
// Create and init simulation object:
Simulation sim ¼ new Simulation(agent, env);
sim.init(null); // assigns environment to agent
// Run simulation:
int numTrials ¼ 10000;
int maxStepsPerTrial ¼ 100;
sim.setTrialDevisor(1000);
sim.trials(numTrials, maxStepsPerTrial);
System.out.println("total time [s]: " + sim.getTimeSpent
ToRunTrials() );
Search WWH ::




Custom Search