Database Reference
In-Depth Information
// Apply policy 10 times:
for (int i
¼
0; i
<
10; i++)
System.out.println("next action: " + egp.nextAction());
// Result, e.g.:
// next action: action: 2.0 index
¼
1
// next action: action: 2.0 index
¼
1
// next action: action: 1.0 index
¼
0
// next action: action: 2.0 index
¼
1
// next action: action: 2.0 index
¼
1
// next action: action: 3.0 index
¼
2
// next action: action: 2.0 index
¼
1
...
■
Agent, Environment
The central class of the
RL
package is, of course,
RLAgent
. It extends the general
Agent
from Sect.
12.2.1
. The generic of
RLAgent
is
Action
because its
apply
and
learnApply
methods return
Action
objects. Unlike the general agent framework of
XELOPES, the
RL
package contains a base
Environment
class.
Environment
is an
abstract class that extends the
EnvironmentInformation
(Sect.
12.2.1
) and imple-
the
RL
package.
RLAgent
has an associated settings class
RLAgentSettings
that extends the
general
AgentSettings
from Sect.
12.2.1
. It stores some basic parameters like the
discount rate
and contains a description of the agent's metadata.
Further,
RLAgent
contains variables
vfunction
for the state-value function,
qfunction
for the action-value function, and
policy
for the policy of the agent (not
all must be used). Further, it has a reference to its
Environment
. For the case where
the agent knows its environment model (i.e., transition probabilities and -rewards),
the variable
envModel
of
RLAgent
can be used. It is of the class
EnvironmentModel
which contains interfaces to access the transition probabilities and -rewards.
was motivated by the RL implementation of Sutton and Santamaria [StSa96]. To
this end, the following method is contained in
Environment
:
γ
public abstract StateRewardVector step(Action action)
throws MiningException;
This method will be called once by the simulation instance in each step of the
simulation.
step
causes the environment to undergo a transition from its current
state to a next state dependent on the
action
. The method returns the next state and
reward as
StateRewardVector
object. If
action
is null, a new episode starts.
The
learnApply
method of
RLAgent
, inherited from
Agent
, with a
StateRewardVector
object as argument serves as counterpart to the
step
method
from the agent side. It takes the next state and reward from the environment and