Meta-learning of Exploration/Exploitation Strategies: The Multi-armed Bandit Case - Agents and Artificial Intelligence

Information Technology Reference

In-Depth Information

F ::= B ( F,F ) | U ( F ) | V | C

B ::= + |−|×|÷|min | max

U ::= sqrt | ln | abs | opposite | inverse

V ::= r k | σ k | t k | t

C ::= 1 , 2 , 3 , 5 , 7

Fig. 1. The grammar used for generating candidate index functions and two example formula

parse trees corresponding to r k +2 /t k and r k + 2 ln ( t ) /t k

- or an atomic variable F = V ,where V belongs to a set of variables

- or a constant F = C ,where C belongs to a set of constants

In the following, we consider a set of operators and constants that provides a good

compromise between high expressiveness and low cardinality of

.Thesetofbinary

operators considered in this paper

includes the four elementary mathematic opera-

tions and the min and max operators:

{

+ ,

−

, min , max

}

. The set of unary

operators

contains the square root, the logarithm, the absolute value, the opposite and

= √ ., ln( . ) ,|.|,−., . . The set of variables

the inverse:

is:

{r k , σ k ,t k ,t}

The set of constants

has been chosen to maximize the number of different numbers

representable by small formulas. It is defined as

Figure 1 summarizes our grammar of formulas and gives two examples of index

functions. The length of a formula length ( f ) is the number of symbols occurring in the

{

1 , 2 , 3 , 5 , 7

}

formula. For example, the length of r k +2 /t k is 5 and the length of r k + 2

ln ( t ) /t k

is 9. Let L be a given maximal length. Θ is the subset of formulas whose length is no

more than L : Θ =

and Π Θ is the set of index-based policies whose

index functions are defined by formulas f

{

length ( f )

≤

}

∈

Θ .

5.2

Optimisation Algorithm

We now discuss the optimization of Equation 7 in the case of our symbolic parame-

terization. First, notice that several different formulas can lead to the same policy. For

example, any increasing function of r k defines the greedy policy, which always selects

the arm that is believed to be the best. Examples of such functions in our formula search

space include r k , r k ×

r k or √ r k .

Since it is useless to evaluate equivalent policies multiple times, we propose the

following two-step approach. First, the set Θ is partitioned into equivalence classes, two

formulas being equivalent if and only if they lead to the same policy. Then, Equation

7 is solved over the set of equivalence classes (which is typically one or two orders of

magnitude smaller than the initial set Θ ).

2 , r k ×

Partitioning Θ . This task is far from trivial: given a formula, equivalent formulas can

be obtained through commutativity, associativity, operator-specific rules and through

any increasing transformation. Performing this step exactly involves advanced static

analysis of the formulas, which we believe to be a very difficult solution to implement.

Agents and Artificial Intelligence

Search WWH ::

Custom Search

Home