Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

2.2.1.3

Action Space Dimensions

This section describes characteristics that are related to the action space of the domain,

or those that are most closely associated to the actions of the agent.

Action Space Continuity Action space continuity refers to whether the actions

of the agent are discrete or continuous. Often there is some relation between the

continuity of the state space and the action space, where discrete state spaces have

discrete action spaces and continuous state spaces have continuous action spaces, but

this is not always the case. The most common state space-action space pairing is the

discrete state-discrete action paring, such as in most games. Continuous state domains

are often control-type problems or have some underlying real-world dynamics, and

these can use either discrete actions (e.g., mountain car domain) or continuous actions

(e.g. robot control).

Branching Factor The branching factor refers to the number of actions that can be

taken from any state. Some domains have constant branching factors where there

is a constant number of possible actions that can be taken from any and all states.

Other domains have non-constant branching factors where the number of actions

from any state may increase or decrease depending on the state, and the branching

factor therefore takes on a distribution over the state space. The branching factor can

also be thought of as a form of constraint in the sense that the state trajectory of the

agent is somewhat guided or constrained, rather than allowing for the entire state

space to be reached from any other state.

2.2.1.4

Reward Dimension

Reward characteristics are specifically related to the reward function. As mentioned,

while the term reward has a positive connotation, this term is used for any type

of feedback provided to the agent and could therefore be aversive (i.e., negative).

Loosely, a reward is any concrete information that is provided to the agent that is

indicative of the true value or quality of a being in a state or following a trajectory.

Reward Stochasticity Reward stochasticity refers to the whether or not rewards

for any particular state are deterministic or stochastic. More specifically, this char-

acteristic specifies if rewards are provided every time the agent visits a particular

state or if rewards are provided with some probability. The vast majority of domains

use deterministic reward functions where the reward is fixed to a particular state or a

group of states. However, domains may also provide rewards a fraction of the time,

thus providing relatively less feedback to the agent. Note that this characteristics

does not refer to how many states have rewards nor the magnitude of the reward(s),

and these characteristics will be defined next.

Reward Distribution The reward distribution refers to how the rewards are dis-

tributed over the state space, as well as the magnitude of these rewards. Some domains

have a single reward state (e.g., the mountain car problem), whereas other domains

Search WWH ::

Custom Search

Home