Civil Engineering Reference
In-Depth Information
characteristics. Stationarity refers to a process where the statistical properties of the
distribution do not change over time. A domain may have a stochastic element to it
that has some underlying distribution, for example, the distribution of sensor noise
which affects its perception of the current state. If this distribution does not change
over time, this characteristic is stationarity, whereas if it does change over time, this
characteristic is non-stationary.
Observability Observability refers to what information is available (observable)
to the agent during the learning process. If all of the relevant state information
(i.e., information that is necessary to learn how to behave optimally in the domain)
is available to the agent, the domain is said to be fully observable. If not all of the
relevant state information is available to the agent, the domain is partially observable,
and the decision process is no longer considered to be Markovian. Examples of
partial observability include missing state information, sensor noise, and perceptual
aliasing. Using the cart-pole balancing task as an example, missing state information
could be where only the angle of the pole is available to the agent, whereas both
the pole angle and angular velocity are traditionally available to the agent in this
problem. Sensor noise is where the agent's observation of the state is noisy, e.g.,
the observed pole angle and angular velocity are different than their true values.
Perceptual aliasing is where multiple states look to be the same based on current
perceptual information of the agent, and only by using previous state information
can these states be differentiated (Bakker et al. 2002 ).
A large portion of the reinforcement learning literature focuses on fully observ-
able domains because they are simple and amenable to theoretical analysis (Jaakkola
et al. 2003 ), however, Bakker et al. ( 2002 ) suggests that partially observable do-
mains are more similar to real world problems. Partial observability can have a
significant impact on the ability of an agent to learn. Intuitively, with partial observ-
ability, the performance of learning algorithms based on the Markovian properties
(e.g., temporal difference methods) suffers due to their underlying assumptions of
the environment. As a results, algorithms have been developed to handle partially-
observable Markov decision processes (POMDPs) (Baxter and Bartlett 2000 ), but
these are not considered in this work.
2.2.1.2
State Space Dimensions
This section describes characteristics that are related to the state space of the domain,
or those that are most closely associated to the numerical representation of the state,
the state vector.
State Space Continuity State space continuity refers to whether the state space is
discrete or continuous. Many reinforcement learning problems have discrete state
spaces (e.g., Gridworld, games, etc.). Those that have continuous state spaces are
often based on some (potentially real-world) dynamic system (e.g., mountain car,
pendulum balancing, helicopter control, etc.). It is also important to acknowledge
the underlying state transition process of discrete and continuous state spaces. State
Search WWH ::




Custom Search