Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

Domain characteristics

General

characteristics

State space

Action space

Rewards

Stochasticity

Continuity

Time horizon

Distribution

Encoding

scheme

Branching

factor

Number of

agents

Stationarity

Complexity

Stationarity

Dimensionality

Observability

Stochasticity

Fig. 2.7 Domains can be characterized by general,

state space,

action space,

and reward

characteristics, each of which can be considered at a finer level.

Time Horizon The time horizon refers to the length of time over which learning

occurs. There are generally two types of horizons in decision processes: finite and

infinite horizons. Finite horizon problems, which are considered in this work and are

also known as episodic problems, have some termination criteria that ends an episode,

which may be based on a finite time limit or, more often, on reaching some absorbing

state. These are the types of problems that classical reinforcement learning considers

almost exclusively. Infinite horizon problems have no absorbing state and instead

the learning and decision making processes extend infinitely in time. These types

of problems are often geared toward real-world or business-like domains, such as

truck routing optimization, and are more associated with the stochastic optimization

or approximate dynamic programming communities (Powell 2007 ).

Number of Agents This characteristic refers to the number of agents that are present

in the environment. If there are multiple agents, the agents also interact with each

other. In many reinforcement learning problems, especially in benchmark problems,

there is a single agent (e.g., Gridworld, mountain car, etc.), though control problems

also often have only a single agent. In other problems, such as in games, there are

often two agents who have opposing goals. In other problems still, there could be

more than two agents that compete for their own goal or a single goal, or there could

be teams of agents that compete for a single goal or opposing goals (Littman 2001 ).

As stated, this dimension is also tied to the number of goals, which will be defined

later. In this work, only single-agent domains are considered.

Domain Stationarity Domain stationarity refers to how any characteristic or

structural property of the domain changes over the course of agent-environment

interactions. A stationary domain has characteristics that does not change, whereas

a non-stationary domain has characteristics that do change with time. It is important

to distinguish stationarity from stochasticity, which is used to describe other domain

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home