Information Technology Reference
In-Depth Information
Instead, we propose a simple approximated solution, which consists in discriminating
formulas by comparing how they rank (in terms of values returned by the formula) a set
of d random samples of the variables r k , σ k ,t k ,t . More formally, the procedure is the
following:
1. we first build Θ , the space of all formulas f such that length ( f )
L ;
2. for i =1 ...d , we uniformly draw (within their respective domains) some random
realizations of the variables r k , σ k ,t k ,t that we concatenate into a vector Θ i ;
3. we cluster all formulas from Θ according to the following rule: two formulas F and
F belong to the same cluster if and only if they rank all the Θ i points in the same
order, i.e.:
F ( Θ j ) .
Formulas leading to invalid index functions (caused for instance by division by zero
or logarithm of negative values) are discarded;
4. among each cluster, we select one formula of minimal length;
5. we gather all the selected minimal length formulas into an approximated reduced
set of formulas Θ .
F ( Θ i )
i,j
∈{
1 ,...,d
}
,i
= j, F ( Θ i )
F ( Θ j )
⇐⇒
In the following, we denote by M the cardinality of the approximate set of formulas
Θ =
.
Optimization Algorithm . A naive approach for finding the best formula f
{
f 1 ,...,f M }
Θ would
Θ and simply return the best one. While ex-
tremely simple to implement, such an approach could reveal itself to be time-inefficient
in case of spaces Θ of large cardinality.
Preliminary experiments have shown us that Θ contains a majority of formulas lead-
ing to relatively bad performing index-based policies. It turns out that relatively few
samples of R P ( i ) ,T are sufficient to reject with high confidence these badly performing
formulas. In order to exploit this idea, a natural idea is to formalize the search for the
best formula as another multi-armed bandit problem. To each formula F k
be to evaluate Δ ( f ) for each formula f
Θ ,weas-
sociate an arm. Pulling the arm k consists in selecting a training problem P ( i ) and in
running one episode with the index-based policy whose index formula is f k . This leads
to a reward associated to arm k whose value is the quantity
−R P ( i ) ,T observed during
the episode. The purpose of multi-armed bandit algorithms is here to process the se-
quence of observed rewards to select in a smart way the next formula to be tried so that
when the budget of pulls has been exhausted, one (or several) high-quality formula(s)
can be identified.
In the formalization of Equation 7 as a multi-armed bandit problem, only the quality
of the finally suggested arm matters. How to select arms so as to identify the best one
in a finite amount of time is known as the pure exploration multi-armed bandit problem
[11]. It has been shown that index-based policies based on upper confidence bounds
were good policies for solving pure exploration bandit problems. Our optimization pro-
cedure works as follows: we use a bandit algorithm such as UCB1-T UNED during a
given number of steps and then return the policy that corresponds to the formula f k
with highest expected reward r k . The problem instances are selected depending on the
number of times the arm has been played so far: at each step, we select the training
problem P ( i )
with i =1+( t k mod N ) .
 
Search WWH ::




Custom Search