Reinforcement Learning - Design of Experiments for Reinforcement Learning

Civil Engineering Reference

In-Depth Information

2.2.2.1

Look-up Tables

Look-up tables are the most basic form of representation for reinforcement learning.

These tables store informative (i.e., state values) for each and every state. The states

used in the look-up table may use either the raw state space or derived/hand-coded

features as look-up indices. This type of representation could be considered the most

localized type of representation because of the unique state-value mapping, and thus

state value updates only influence individual states that are visited at a time. State

values of all unvisited states are not affected, which makes this representation immune

to updates of neighboring or nearby states. The properties of look-up tables can be

rigorously analyzed and it can be proved that some learning algorithms converge

under various conditions and assumptions, such as each state being sampled infinitely

many times and the learning rates be set appropriately (Watkins 1989 ; Watkins and

Dayan 1992 ; Jaakkola et al. 2003 ; Tsitsiklis and Roy 1996 ).

Despite the attractive properties of look-up tables, they do have some disadvan-

tages. The primary disadvantage of look-up tables is that they are limited to relatively

small and discrete state spaces due to memory constraints of computers as well as

due to the computationally intensive task of having to search through all entires in

the table (Baird 1995 ; Atkeson et al., 1997 ). In some cases, a simpler, and more com-

pact representation (e.g., neural network) can accurately model the value function

instead, and it can do so with far fewer parameters than unique parameters required

for a look up table (Moody and Tresp 1994 ; Ollington et al. 2009 ). For reinforcement

learning applications to be scaled to real-world problems, more compact represen-

tations are required than look-up tables (Singh et al. 1995 ). However, large state

spaces can be handled by look-up tables using some approximation methods, one of

which includes state aggregation, in which similar states are assumed to have simi-

lar values (Tsitsiklis and Roy 1996 ). Look-up tables can also be use for continuous

state spaces, although states must be discretized in some manner and this may not

be straightforward. Another disadvantage of look-up tables is that learning can be

slow due to the shear number of states and the fact that state values are updated

individually; in other words, look-up tables have no ability to generalize the state

value updates (Tesauro 1992 ; Ollington et al. 2009 ).

Perhaps the earliest use of look-up tables for reinforcement learning might be the

work of Michie and Chambers ( 1968 ) in which 162 'boxes' (i.e., entries in the look-

up tables) are used to approximate the value function for the cart-pole balancing task.

While look-up tables could be viewed as a relatively primitive representation, they

have not been completely replaced by other representations and they are still used in

contemporary research. Whiteson and Stone ( 2006 ) used a look-up table for adaptive

job routing/scheduling, and Nevmyvaka et al. ( 2006 ) used this representation for

optimizing trade execution. A more recent application proved that look-up tables

can converge to the optimal policy for best-match learning, a hybrid sparse model-

based and model-free Q -learning approach (van Seijen et al. 2011 ). Additionally,

Nissen ( 2007 ) used look-up tables for playing blackjack using multiple learning

algorithms including Q ( ʻ ), Sarsa( ʻ ) and Q -Sarsa( ʻ ). Another use of look-up tables

is for comparative purposes for benchmarking other representations. Wiering ( 1995 )

Design of Experiments for Reinforcement Learning

Search WWH ::

Custom Search

Home