Database Reference
In-Depth Information
Fig. 12.16 Symbolic
vector representation of
state transition by taking an
action
state 1
action
state 2
Core: core and infrastructure
DP: dynamic programming algorithms of RL
MC: Monte Carlo algorithms of RL
TD: temporal-difference learning algorithms of RL
Approx: function approximation
MultiLevel: multilevel methods
Recomm: RL for recommendations
The RL algorithms implement the agent interface described in the previous
section. We now describe the central packages except for the Recomm package
which will be studied in Sect. 12.2.3 .
12.2.2.1 Core
State, Action, Reward
The class State extends MiningVector. To avoid philosophical discussions why a
state is a mining vector, we just mention that since it is used as argument of the
state-value function v(s), which in turn is represented by a MiningModel (will be
explained below), it must be a mining vector. The class Action also extends
MiningVector. Figure 12.16 illustrates the motivation: an action is something that
moves one state into another. Since states are mining vectors (coordinate vectors),
actions must be mining vectors (transition vectors), too!
The classes State and Action do not directly extend MiningVector but the class
IndexedMiningVector which is a mining vector with an index, accessible via
getIndex and setIndex methods. The index is useful for discrete state and action
sets S and A(s) , respectively. The index, as almost all integer-like RL
implementations in XELOPES, uses long as data type because in RL there may
be a huge number of states, actions, steps, etc., that potentially cannot be stored as
native integers.
The class Reward is mainly a wrapper class for a double value. Due to the fact
that Reward always contains one value only, instead of an array, it does not extend
Search WWH ::




Custom Search