Building a Recommendation Engine: The XELOPES Library - Realtime Data Mining

Database Reference

In-Depth Information

The main methods of Policy are:

public Action nextAction() throws MiningException;

returns next action following the policy,

public abstract double probability(Action action) throws

MiningException;

returns the probability p(s,a) of an action to be taken.

Different subclasses extend Policy for different types of policies. The most

important policy class is GreedyPolicy , which selects the action(s) of the highest

action value (best action). The class EpsilonGreedyPolicy extends GreedyPolicy for

an

ε

-greedy policy. The class SoftmaxPolicy extends Policy for a softmax policy.

Example 12.20 We give an example of an

-greedy policy. The action set contains

three possible actions. The action values are defined through an action-value

function with the second action having maximum reward. For

ε

ε ¼ 0.2, the

ε

-greedy

policy selects the “greedy” action 2 in 80 % of all calls of nextAction.

// Define action set:

double[] s1 ¼ {0};

State st1 ¼ new State(s1, 0); // state index 0

double[] a1 ¼ {1};

Action act1 ¼ new Action(a1);

double[] a2 ¼ {2};

Action act2 ¼ new Action(a2);

double[] a3 ¼ {3};

Action act3 ¼ new Action(a3);

ActionSet as ¼ new ActionSet();

as.setState(st1);

as.addAction(act1);

// action index 0, automatically

assigned

as.addAction(act2);

// action index 1, automatically

assigned

as.addAction(act3);

// action index 2, automatically

assigned

// Define action-value function:

ActionValueFunction qfunction ¼ new ActionValueFunction

();

qfunction.setValue(st1, act1, -1);

qfunction.setValue(st1, act2, 8);

qfunction.setValue(st1, act3, 5);

// Define greedy policy:

EpsilonGreedyPolicy egp ¼ new EpsilonGreedyPolicy();

egp.setActionSet(as);

egp.setActionValueFunction(qfunction);

egp.setEpsilon(0.2);

Realtime Data Mining

Search WWH ::

Custom Search

Home