Database Reference
In-Depth Information
returns an action which in turn is passed to the environment. If it receives a terminal
state from the environment, it returns a null action that causes the environment to
start a new episode. In this way, the interaction of Fig. 3.1 is supported by RLAgent
and its associated Environment. The actual simulation is executed by the Simulation
class which finally presents some statistics.
12.2.2.2 RL Algorithm Packages
DP Package
The dynamic programming algorithms are organized in the DP package. It contains
an own environment class DPEnvironment which extends Environment from the
RL Core package. The central method of DPEnvironment is getEnvironmentModel
which returns the model object of the environment which is an instance of
EnvironmentModel.
EnvironmentModel contains two methods getTransProb and getTransRew to
return the transition probabilities p ss 0 and -rewards r ss 0 , respectively. Both are
modeled by the interface TransitionFunction which represents the three-
dimensional tensor of transition values from state s to state s 0 under action a.
The abstract class DPAgent extends RLAgent , and from its assigned
DPEnvironment , it takes the model of the environment. Since DPAgent learns in
offline mode, it has a similar method as MiningAlgorithm from the data mining
framework to run the learning, buildModel , that solves the Bellman equation ( 3.7 ).
Only after this method has been called, the policy of the DPAgent can be used.
The policy of DPAgent is always a greedy policy and hence an instance of
GreedyPolicy class.
The classes PolicyIterationAgent and ValueIterationAgent both extend DPAgent
for the policy iteration and value iteration algorithms explained in Sect. 3.9.4 . They
have only few parameters, and in most cases the user has not to care about them.
Example 12.21 We show the example that solves the GridWorld problem of
[SB98]. (Notice that the main implementation amount requires the environment
class GridJumpEnvironment not listed here.)
// Create agent settings:
RLAgentSettings agentSettings ¼ new RLAgentSettings();
agentSettings.setInputDataSpecification(metaData);
agentSettings.setGamma(0.9);
agentSettings.verifySettings();
// Get default agent specification from 'agents.xml':
AgentSpecification agentSpecification ¼
AgentSpecification.getAgentSpecification( "PolicyItera-
tionAgent" );
Search WWH ::




Custom Search