Database Reference
In-Depth Information
The main methods of Policy are:
public Action nextAction() throws MiningException;
returns next action following the policy,
public abstract double probability(Action action) throws
MiningException;
returns the probability p(s,a) of an action to be taken.
Different subclasses extend Policy for different types of policies. The most
important policy class is GreedyPolicy , which selects the action(s) of the highest
action value (best action). The class EpsilonGreedyPolicy extends GreedyPolicy for
an
ε
-greedy policy. The class SoftmaxPolicy extends Policy for a softmax policy.
Example 12.20 We give an example of an
-greedy policy. The action set contains
three possible actions. The action values are defined through an action-value
function with the second action having maximum reward. For
ε
ε ¼ 0.2, the
ε
-greedy
policy selects the “greedy” action 2 in 80 % of all calls of nextAction.
// Define action set:
double[] s1 ¼ {0};
State st1 ¼ new State(s1, 0); // state index 0
double[] a1 ¼ {1};
Action act1 ¼ new Action(a1);
double[] a2 ¼ {2};
Action act2 ¼ new Action(a2);
double[] a3 ¼ {3};
Action act3 ¼ new Action(a3);
ActionSet as ¼ new ActionSet();
as.setState(st1);
as.addAction(act1);
// action index 0, automatically
assigned
as.addAction(act2);
// action index 1, automatically
assigned
as.addAction(act3);
// action index 2, automatically
assigned
// Define action-value function:
ActionValueFunction qfunction ¼ new ActionValueFunction
();
qfunction.setValue(st1, act1, -1);
qfunction.setValue(st1, act2, 8);
qfunction.setValue(st1, act3, 5);
// Define greedy policy:
EpsilonGreedyPolicy egp ¼ new EpsilonGreedyPolicy();
egp.setActionSet(as);
egp.setActionValueFunction(qfunction);
egp.setEpsilon(0.2);
Search WWH ::




Custom Search