Civil Engineering Reference
In-Depth Information
have multiple states that provide rewards of equal magnitudes, such as in Tic-tac-toe
where there are many ways to win the game. Still, other domains may have multiple
reward states where the magnitude of the rewards may depend on the state, as in the
game of backgammon where there are different types of wins.
Though there could be countless types of reward distributions, we define a number
of cases that represent some of those found in common domains. (1) There is a single
state with a positively-valued reward. (2) There is a single state with a positively-
valued reward and a single state with a negatively-valued reward, each of the same
magnitude. (3) There are multiple states with either positively-value or negative-
valued rewards, where all rewards are of the same magnitude. (4) There are multiple
states with either positively-value or negative-valued distributions of rewards.
A special case of a reward distribution is that of a reward hierarchy. In this case
rewards are provided en route to the primary goal state in order to shape or guide
the agent's trajectory in large and complex domains. Essentially, relatively smaller
rewards are placed within the domain acting as subgoals, and these subgoals are used
to lead the agent to the main goal of the domain where there is some large positively-
valued reward. One example of this is in the hierarchical office domain by Bakker
and Schmidhuber ( 2004 ) for which a specific learning algorithm was developed.
Reward Stationarity Reward stationarity refers to how the reward distribution
changes over time, thus adding a temporal dimension to the reward distribution.
In most domains, rewards are stationary and do not change over time. However, it is
possible that rewards could be non-stationary, such as when rewards decrease over
time when attempting to encourage the agent to reach a goal more efficiently.
2.2.1.5
State Encoding-Dependent Characteristics
As noted, some of the characteristics defined above are dependent on the state en-
coding scheme. Most evidently, the state encoding scheme could affect the state
space dimensionality and the state space complexity. This may or may not change
the branching factor depending on whether or not actions are derived from the raw
state encoding or from a novel state encoding. While novel state encoding schemes
will not be considered in this work, it is important to acknowledge that the state
encoding scheme has an effect on a number of characteristics, and this changes the
dynamics of the learning problem.
2.2.2
Representations
Reinforcement learning requires the use of some form of memory device in order to
retain and update state values. The type of memory device, or representation as it will
be called here, is often chosen based on qualities of the environment that seem to pair
well with the representation. There have been many different representations used
in reinforcement learning, with some more dominant than others, and this section
briefly reviews some of the different representations used, including look-up tables,
linear methods, and neural networks.
Search WWH ::




Custom Search