Database Reference
In-Depth Information
// Apply policy 10 times:
for (int i ¼ 0; i < 10; i++)
System.out.println("next action: " + egp.nextAction());
// Result, e.g.:
// next action: action: 2.0 index ¼ 1
// next action: action: 2.0 index ¼ 1
// next action: action: 1.0 index ¼ 0
// next action: action: 2.0 index ¼ 1
// next action: action: 2.0 index ¼ 1
// next action: action: 3.0 index ¼ 2
// next action: action: 2.0 index ¼ 1
...
Agent, Environment
The central class of the RL package is, of course, RLAgent . It extends the general
Agent from Sect. 12.2.1 . The generic of RLAgent is Action because its apply and
learnApply methods return Action objects. Unlike the general agent framework of
XELOPES, the RL package contains a base Environment class. Environment is an
abstract class that extends the EnvironmentInformation (Sect. 12.2.1 ) and imple-
ments StateActionSet . So the complete interaction of Fig. 3.1 can be modeled by
the RL package.
RLAgent has an associated settings class RLAgentSettings that extends the
general AgentSettings from Sect. 12.2.1 . It stores some basic parameters like the
discount rate
and contains a description of the agent's metadata.
Further, RLAgent contains variables vfunction for the state-value function,
qfunction for the action-value function, and policy for the policy of the agent (not
all must be used). Further, it has a reference to its Environment . For the case where
the agent knows its environment model (i.e., transition probabilities and -rewards),
the variable envModel of RLAgent can be used. It is of the class EnvironmentModel
which contains interfaces to access the transition probabilities and -rewards.
The RL package also supports simulations in the spirit of Fig. 3.1 . The approach
was motivated by the RL implementation of Sutton and Santamaria [StSa96]. To
this end, the following method is contained in Environment :
γ
public abstract StateRewardVector step(Action action)
throws MiningException;
This method will be called once by the simulation instance in each step of the
simulation. step causes the environment to undergo a transition from its current
state to a next state dependent on the action . The method returns the next state and
reward as StateRewardVector object. If action is null, a new episode starts.
The learnApply method of RLAgent , inherited from Agent , with a
StateRewardVector object as argument serves as counterpart to the step method
from the agent side. It takes the next state and reward from the environment and
Search WWH ::




Custom Search