Database Reference
In-Depth Information
// Set agent parameters:
agentSpecification.setAPValue("maxPolIter", 100);
agentSpecification.setAPValue("maxEvalIter", 200);
agentSpecification.setAPValue("theta", 0.0001);
// Create algorithm object with default values:
DPAgent agent ¼ (DPAgent) agentSpecification.createAgen-
tInstance();
// Put it all together:
agent.setAgentSettings(agentSettings);
agent.verify();
// Create DP environment:
DPEnvironment env ¼ new GridJumpEnvironment();
// Create and init simulation object:
Simulation sim ¼ new Simulation(agent, env);
sim.init(null); // assigns environment to agent
// Build DP model solving Bellman equation:
System.out.println("TRAINING");
agent.buildModel();
System.out.println( agent.getVfunction() ); // optimal
state-value function
// Run simulation:
System.out.println("SIMULATION");
int maxStepsPerTrial ¼ 10;
sim.steps(maxStepsPerTrial);
System.out.println("total time [s]: " + sim.getTimeSpent-
ToRunTrials() );
MC Package
The Monte Carlo algorithms are organized in the MC package. These algorithms
are simple, and the package contains basic implementations of MC algorithms like
OnPolicyMCAgent for the on-policy MC algorithm and OffPolicyMCAgent for the
off-policy MC algorithm. Consult [SB98] for these algorithms and their parameters,
whose names in XELOPES are consistent to the topic.
TD Package
The temporal-difference learning algorithms are organized in the TD package.
Examples are the classes SarsaAgent for the Sarsa, on-policy algorithm and
SarsaLambdaAgent for the Sarsa(
λ
), on-policy algorithm and WatkinsQAgent for
Search WWH ::




Custom Search