Statistical Learning Theory for Ranking - Learning to Rank for Information Retrieval

Information Technology Reference

In-Depth Information

Fig. 15.1

Taxonomy of statistical learning theory

problem, which serves as a reference to study the properties of different surrogate

loss functions used by various learning algorithms. Usually the average surrogate

loss on the training data is called the empirical surrogate risk (denoted as R φ ), the

expected surrogate loss on the entire product space of input and output is called the

expected surrogate risk (denoted as R φ ), the average true loss on the training data

is called the empirical true risk (denoted as R 0 ), and the expected true loss on the

entire product space of input and output is called the expected true risk (denoted as

R 0 ).

With the above settings, there are three major tasks in machine learning: the

selection of training data (sampled from input and output spaces), the selection of

hypothesis space, and the selection of the surrogate loss function. In the figure,

we refer to them as training set construction, model selection, and surrogate loss

selection, respectively.

In order to select appropriate training data (e.g., the number of training instances

needed to guarantee the given test performance of the learned model), one needs

to study the Generalization Ability of the algorithm, as a function of the number of

training instances. The generalization analysis on an algorithm is concerned with

whether and at what rate its empirical risk will converge to the expected risk, when

the number of training samples approaches infinity. Sometimes, we alternatively

represent the generalization ability of an algorithm using the bound of the differ-

ence between its expected risk and empirical risk, and see whether and at what rate

the bound will converge to zero when the number of training samples approaches

infinity. In general, an algorithm is regarded as better than the other algorithm if

its empirical risk can converge to the expected risk but that of the other cannot.

Furthermore, an algorithm is regarded as better than the other if its corresponding

convergence rate is faster than that of the other.

Search WWH ::

Custom Search

Home