Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

have multiple states that provide rewards of equal magnitudes, such as in Tic-tac-toe

where there are many ways to win the game. Still, other domains may have multiple

reward states where the magnitude of the rewards may depend on the state, as in the

game of backgammon where there are different types of wins.

Though there could be countless types of reward distributions, we define a number

of cases that represent some of those found in common domains. (1) There is a single

state with a positively-valued reward. (2) There is a single state with a positively-

valued reward and a single state with a negatively-valued reward, each of the same

magnitude. (3) There are multiple states with either positively-value or negative-

valued rewards, where all rewards are of the same magnitude. (4) There are multiple

states with either positively-value or negative-valued distributions of rewards.

A special case of a reward distribution is that of a reward hierarchy. In this case

rewards are provided en route to the primary goal state in order to shape or guide

the agent's trajectory in large and complex domains. Essentially, relatively smaller

rewards are placed within the domain acting as subgoals, and these subgoals are used

to lead the agent to the main goal of the domain where there is some large positively-

valued reward. One example of this is in the hierarchical office domain by Bakker

and Schmidhuber ( 2004 ) for which a specific learning algorithm was developed.

Reward Stationarity Reward stationarity refers to how the reward distribution

changes over time, thus adding a temporal dimension to the reward distribution.

In most domains, rewards are stationary and do not change over time. However, it is

possible that rewards could be non-stationary, such as when rewards decrease over

time when attempting to encourage the agent to reach a goal more efficiently.

2.2.1.5

State Encoding-Dependent Characteristics

As noted, some of the characteristics defined above are dependent on the state en-

coding scheme. Most evidently, the state encoding scheme could affect the state

space dimensionality and the state space complexity. This may or may not change

the branching factor depending on whether or not actions are derived from the raw

state encoding or from a novel state encoding. While novel state encoding schemes

will not be considered in this work, it is important to acknowledge that the state

encoding scheme has an effect on a number of characteristics, and this changes the

dynamics of the learning problem.

2.2.2

Representations

Reinforcement learning requires the use of some form of memory device in order to

retain and update state values. The type of memory device, or representation as it will

be called here, is often chosen based on qualities of the environment that seem to pair

well with the representation. There have been many different representations used

in reinforcement learning, with some more dominant than others, and this section

briefly reviews some of the different representations used, including look-up tables,

linear methods, and neural networks.

Search WWH ::

Custom Search

Home