Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

characteristics. Stationarity refers to a process where the statistical properties of the

distribution do not change over time. A domain may have a stochastic element to it

that has some underlying distribution, for example, the distribution of sensor noise

which affects its perception of the current state. If this distribution does not change

over time, this characteristic is stationarity, whereas if it does change over time, this

characteristic is non-stationary.

Observability Observability refers to what information is available (observable)

to the agent during the learning process. If all of the relevant state information

(i.e., information that is necessary to learn how to behave optimally in the domain)

is available to the agent, the domain is said to be fully observable. If not all of the

relevant state information is available to the agent, the domain is partially observable,

and the decision process is no longer considered to be Markovian. Examples of

partial observability include missing state information, sensor noise, and perceptual

aliasing. Using the cart-pole balancing task as an example, missing state information

could be where only the angle of the pole is available to the agent, whereas both

the pole angle and angular velocity are traditionally available to the agent in this

problem. Sensor noise is where the agent's observation of the state is noisy, e.g.,

the observed pole angle and angular velocity are different than their true values.

Perceptual aliasing is where multiple states look to be the same based on current

perceptual information of the agent, and only by using previous state information

can these states be differentiated (Bakker et al. 2002 ).

A large portion of the reinforcement learning literature focuses on fully observ-

able domains because they are simple and amenable to theoretical analysis (Jaakkola

et al. 2003 ), however, Bakker et al. ( 2002 ) suggests that partially observable do-

mains are more similar to real world problems. Partial observability can have a

significant impact on the ability of an agent to learn. Intuitively, with partial observ-

ability, the performance of learning algorithms based on the Markovian properties

(e.g., temporal difference methods) suffers due to their underlying assumptions of

the environment. As a results, algorithms have been developed to handle partially-

observable Markov decision processes (POMDPs) (Baxter and Bartlett 2000 ), but

these are not considered in this work.

2.2.1.2

State Space Dimensions

This section describes characteristics that are related to the state space of the domain,

or those that are most closely associated to the numerical representation of the state,

the state vector.

State Space Continuity State space continuity refers to whether the state space is

discrete or continuous. Many reinforcement learning problems have discrete state

spaces (e.g., Gridworld, games, etc.). Those that have continuous state spaces are

often based on some (potentially real-world) dynamic system (e.g., mountain car,

pendulum balancing, helicopter control, etc.). It is also important to acknowledge

the underlying state transition process of discrete and continuous state spaces. State

Search WWH ::

Custom Search

Home