Database Reference
In-Depth Information
returns an action which in turn is passed to the environment. If it receives a terminal
state from the environment, it returns a null action that causes the environment to
start a new episode. In this way, the interaction of Fig.
3.1
is supported by
RLAgent
and its associated
Environment.
The actual simulation is executed by the
Simulation
class which finally presents some statistics.
12.2.2.2 RL Algorithm Packages
DP Package
The dynamic programming algorithms are organized in the
DP
package. It contains
an own environment class
DPEnvironment
which extends
Environment
from the
RL Core package. The central method of
DPEnvironment
is
getEnvironmentModel
which returns the model object of the environment which is an instance of
EnvironmentModel.
EnvironmentModel
contains two methods
getTransProb
and
getTransRew
to
return the transition probabilities
p
ss
0
and -rewards
r
ss
0
, respectively. Both are
modeled by the interface
TransitionFunction
which represents the three-
dimensional tensor of transition values from state
s
to state
s
0
under action
a.
The abstract class
DPAgent
extends
RLAgent
, and from its assigned
DPEnvironment
, it takes the model of the environment. Since
DPAgent
learns in
offline mode, it has a similar method as
MiningAlgorithm
from the data mining
Only after this method has been called, the
policy
of the
DPAgent
can be used.
The policy of
DPAgent
is always a greedy policy and hence an instance of
GreedyPolicy
class.
The classes
PolicyIterationAgent
and
ValueIterationAgent
both extend
DPAgent
have only few parameters, and in most cases the user has not to care about them.
Example 12.21
We show the example that solves the GridWorld problem of
[SB98]. (Notice that the main implementation amount requires the environment
class
GridJumpEnvironment
not listed here.)
// Create agent settings:
RLAgentSettings agentSettings
¼
new RLAgentSettings();
agentSettings.setInputDataSpecification(metaData);
agentSettings.setGamma(0.9);
agentSettings.verifySettings();
// Get default agent specification from 'agents.xml':
AgentSpecification agentSpecification
¼
AgentSpecification.getAgentSpecification( "PolicyItera-
tionAgent" );