Information Technology Reference
In-Depth Information
Fig. 15.1
Taxonomy of statistical learning theory
problem, which serves as a reference to study the properties of different surrogate
loss functions used by various learning algorithms. Usually the average surrogate
loss on the training data is called the empirical surrogate risk (denoted as R φ ), the
expected surrogate loss on the entire product space of input and output is called the
expected surrogate risk (denoted as R φ ), the average true loss on the training data
is called the empirical true risk (denoted as R 0 ), and the expected true loss on the
entire product space of input and output is called the expected true risk (denoted as
R 0 ).
With the above settings, there are three major tasks in machine learning: the
selection of training data (sampled from input and output spaces), the selection of
hypothesis space, and the selection of the surrogate loss function. In the figure,
we refer to them as training set construction, model selection, and surrogate loss
selection, respectively.
In order to select appropriate training data (e.g., the number of training instances
needed to guarantee the given test performance of the learned model), one needs
to study the Generalization Ability of the algorithm, as a function of the number of
training instances. The generalization analysis on an algorithm is concerned with
whether and at what rate its empirical risk will converge to the expected risk, when
the number of training samples approaches infinity. Sometimes, we alternatively
represent the generalization ability of an algorithm using the bound of the differ-
ence between its expected risk and empirical risk, and see whether and at what rate
the bound will converge to zero when the number of training samples approaches
infinity. In general, an algorithm is regarded as better than the other algorithm if
its empirical risk can converge to the expected risk but that of the other cannot.
Furthermore, an algorithm is regarded as better than the other if its corresponding
convergence rate is faster than that of the other.
Search WWH ::




Custom Search