Information Technology Reference
In-Depth Information
Algorithm 2 . EDA-based learning of a discrete bandit policy
Given the number of iterations i max ,
Given the population size n p ,
Given the number of best elements b ,
Given a sample of training bandit problems P (1) ,...,P ( N ) ,
Given an history-features function φ ( ·, · )
d ,
1: Set μ p =0 p =1 , ∀p ∈ [1 ,d ]
Initialize with normal Gaussians
2: for i ∈ [1 ,i max ] do
3:
for j ∈ [1 ,n p ] do
Sample and evaluate new population
4:
for p ∈ [1 ,d ] do
N ( μ p p )
θ p
5:
sample from
6:
end for
Estimate Δ ( π θ ) and store result ( θ,Δ ( π θ ))
7:
8:
end for
(1) ,...,θ ( b ) }
9:
Select
the b best candidate θ vectors w.r.t. their Δ ( · ) score
μ p b j =1 θ ( j p , ∀p ∈ [1 ,d ]
10:
Learn new Gaussians
σ p b j =1 ( θ ( j )
− μ p ) 2 , ∀p ∈ [1 ,d ]
11:
p
12: end for
13: return The policy π θ that led to the lowest observed value of Δ ( π θ )
( μ p p ) for each parameter θ p . Although this approach is
simple, it proved to be quite effective experimentally to solve Equation 7. The full de-
tails of our EDA-based policy learning procedure are given by Algorithm 2. The initial
distributions are standard Gaussian distributions
N
one Gaussian distribution
(0 , 1) . The policy that is returned
corresponds to the θ parameters that led to the lowest observed value of Δ ( π θ ) .
N
5
Symbolic Parametrization
The index functions from the literature depend on the current time step t andonthree
statistics extracted from the sub-history H t− 1 : r k , σ k and t k . We now propose a second
parameterization of our learning approach, in which we consider all index functions that
can be constructed using small formulas built upon these four variables.
5.1
Policy Search Space
We consider index functions that are given in the form of small, closed-form formulas.
Closed-form formulas have several advantages: they can be easily computed, they can
formally be analyzed and they are easily interpretable.
Let us first explicit the set of formulas
F
that we consider in this paper. A formula
F
is:
- either a binary expression F = B ( F ,F ) ,where B belongs to a set of binary
operators
F
and F and F are also formulas from
,
- or a unary expression F = U ( F ) where U belongs to a set of unary operators
B
F
U
and F F
,
Search WWH ::




Custom Search