Database Reference
In-Depth Information
Online learning
The batch machine learning methods that we have applied in this topic focus on processing
an existing fixed set of training data. Typically, these techniques are also iterative, and we
have performed multiple passes over our training data in order to converge to an optimal
model.
By contrast, online learning is based on performing only one sequential pass through the
training data in a fully incremental fashion (that is, one training example at a time). After
seeing each training example, the model makes a prediction for this example and then re-
ceives the true outcome (for example, the label for classification or real target for regres-
sion). The idea behind online learning is that the model continually updates as new inform-
ation is received instead of being retrained periodically in batch training.
In some settings, when data volume is very large or the process that generates the data is
changing rapidly, online learning methods can adapt more quickly and in near real time,
without needing to be retrained in an expensive batch process.
However, online learning methods do not have to be used in a purely online manner. In
fact, we have already seen an example of using an online learning model in the batch set-
ting when we used stochastic gradient descent optimization to train our classification and
regression models. SGD updates the model after each training example. However, we still
made use of multiple passes over the training data in order to converge to a better result.
In the pure online setting, we do not (or perhaps cannot) make multiple passes over the
training data; hence, we need to process each input as it arrives. Online methods also in-
clude mini-batch methods where, instead of processing one input at a time, we process a
small batch of training data.
Online and batch methods can also be combined in real-world situations. For example, we
can periodically retrain our models offline (say, every day) using batch methods. We can
then deploy the trained model to production and update it using online methods in real time
(that is, during the day, in between batch retraining) to adapt to any changes in the environ-
ment.
As we will see in this chapter, the online learning setting can fit neatly into stream process-
ing and the Spark Streaming framework.
Search WWH ::




Custom Search