Algorithm Selection: From Meta-Learning to Hyper-Heuristics - Intelligent Systems

Information Technology Reference

In-Depth Information

4.2 Regression analysis

Gent and Walsh make an empirical study of the GSAT algorithm, it is an approximation

algorithm for SAT, and they apply regression analysis to model the growth of the cost of

obtaining the solution with the problem size (Gent 1997).

In (Cruz 1999), Pérez and Cruz present a statistical method to build algorithm performance

models, using polynomial functions, which relate the performance with the problem size.

This method first generates a representative sample of the algorithms performance, and then

the performance functions are determined by regression analysis, which finally are

incorporated in an algorithm selection mechanism. The polynomial functions are used to

predict the best algorithm that satisfies the user requirements.

The performance of local search algorithms Novelty and SAPS for solving instances of the

SAT problem were analyzed by (Hutter 2006). The authors used linear regression with

linear and quadratic basis functions to build prediction models. Firstly, they built a

prediction model, using problem features and algorithm performance, to predict the

algorithm run time. Secondly, they build another prediction model, using problem features,

algorithm parameter settings and algorithm performance. This model is used to

automatically adjust the algorithm's parameters on a per instance basis in order to optimize

its performance.

4.3 Functions of probability distribution

Frost finds that the performance of the algorithms to solve CSP instances can be

approximated by two standard families of functions of continuous probability distribution

(Frost 1997). The resoluble instances can be modeled by the Weibull distribution and the

instances that are not resoluble by the lognormal distribution. He utilizes four parameters to

generate instances: number of constraints, number of prohibited value pairs per constraint,

the probability of a constraint existing between any pair of variables, the probability each

constraint is statistically independent of the others, and the probability that a value in the

domain of one variable in a constraint will be incompatible with a value in the domain of the

other variable in the constraint.

Hoos and Stuzle present a similar work to Frost. They find that the performance of

algorithms that solve the SAT instances can be characterized by an exponential distribution

(Hoos 2000). The execution time distribution is determined by the execution of k times of an

algorithm over a set of instances of the same family, using a high time as stop criteria and

storing for each successful run the execution time required to find the solution. The

empirical distribution of the execution time is the accumulated distribution associated with

these observations, and it allows projecting the execution time t (given by the user) to the

probability of finding a solution in this time. A family is a set of instances with the same

value of the parameters that are considered critical for the performance.

An algorithm portfolio architecture was proposed in (Silverthorn 2010). This architecture

employs three core components: a portfolio of algorithms; a generative model, which is fit to

data on those algorithms past performance, then used to predict their future performance;

and a policy for action selection, which repeatedly chooses algorithms based on those

predictions. Portfolio operation begins with offline training, in which a) training tasks are

Search WWH ::

Custom Search

Home