Information Technology Reference
In-Depth Information
Sect. 3.1.1, that the data-generating process f is expressible by a function, we
can differentiate between two cases:
f is stationary. If the data-generating process does not change with time and
the full training set is available at once, any incremental learning method is
either only an incremental implementation of an equivalent batch learning
algorithm, or an approximation to it.
f is non-stationary. Learning a model of a non-stationary generating process is
only possible if the process is only slowly varying, that is, if it changes slowly
with respect to the frequency that it is observed. Hence, it is reasonable
to assume stationarity at least in a limited time-frame. It is modelled by
putting more weight on later observations, as earlier observations give general
information about the process but might reflect it in an outdated state.
Such recency-weighting of the observations is very naturally achieved within
incremental learning by assigning the current model a lower weight than new
observations.
The advantage of incremental learning methods over batch learning methods
are that the former can handle observations that arrive sequentially as a stream,
and that they more naturally handle non-stationary processes, even though the
second feature can also be simulated by batch learning methods by weighting
the different observations according to their temporal sequence 1 . On the down-
side, when compared to batch learning, incremental learners are generally less
transparent in what exactly they learn, and dynamically more complex.
With respect to the different tasks, incremental learners are particularly suited
to model-free RL, where the value function estimate is learned incrementally and
therefore changes slowly. Given that all data is available at once, regression and
classification tasks are best handled by batch learners.
From the theoretical perspective, incremental learners can be derived from a
batch learner that is applied to solve the same task. This has the advantage of
preserving the transparency of the batch learning method and acquiring the fle-
xibility of the incremental method. This principle is illustrated with the following
example.
Example 3.2 (Relating Batch and Incremental Learning). We want to estimate
the probability of a tossed coin showing head, without any initial bias about
its fairness. We perform N experiments with no input
X
=
and outputs
Y
=
{
, where 0 and 1 stand for tail and head respectively. Adopting a frequentist
approach, the probability of tossing a coin resulting in head can be estimated
by
0 , 1
}
N
p N ( H )= 1
N
y n ,
(3.5)
n =1
1 Naturally, in the case of weighting observations according to their temporal sequence,
the ordering of these observations is - in contrast to what was stated previously in
the batch learning context - of significance.
 
Search WWH ::




Custom Search