Database Reference
In-Depth Information
Fig. 12.16 Symbolic
vector representation of
state transition by taking an
action
state 1
action
state 2
•
Core:
core and infrastructure
•
DP:
dynamic programming algorithms of RL
•
MC:
Monte Carlo algorithms of RL
•
TD:
temporal-difference learning algorithms of RL
•
Approx:
function approximation
•
MultiLevel:
multilevel methods
•
Recomm:
RL for recommendations
The RL algorithms implement the agent interface described in the previous
section. We now describe the central packages except for the
Recomm
package
which will be studied in Sect.
12.2.3
.
12.2.2.1 Core
State, Action, Reward
The class
State
extends
MiningVector.
To avoid philosophical discussions why a
state is a mining vector, we just mention that since it is used as argument of the
state-value function
v(s),
which in turn is represented by a
MiningModel
(will be
explained below), it must be a mining vector. The class
Action
also extends
MiningVector.
Figure
12.16
illustrates the motivation: an action is something that
moves one state into another. Since states are mining vectors (coordinate vectors),
actions must be mining vectors (transition vectors), too!
The classes
State
and
Action
do not directly extend
MiningVector
but the class
IndexedMiningVector
which is a mining vector with an index, accessible via
getIndex
and
setIndex
methods. The index is useful for discrete state and action
sets
S
and
A(s)
, respectively. The index, as almost all integer-like RL
implementations in XELOPES, uses
long
as data type because in RL there may
be a huge number of states, actions, steps, etc., that potentially cannot be stored as
native integers.
The class
Reward
is mainly a wrapper class for a double value. Due to the fact
that
Reward
always contains one value only, instead of an array, it does not extend