Real-time Machine Learning with Spark Streaming - Machine Learning with Spark

Database Reference

In-Depth Information

Chapter 10. Real-time Machine Learning

with Spark Streaming

So far in this topic, we have focused on batch data processing. That is, all our analysis,

feature extraction, and model training has been applied to a fixed set of data that does not

change. This fits neatly into Spark's core abstraction of RDDs, which are immutable dis-

tributed datasets. Once created, the data underlying the RDD does not change, although we

might create new RDDs from the original RDD through Spark's transformation and action

operators.

Our attention has also been on batch machine learning models where we train a model on a

fixed batch of training data that is usually represented as an RDD of feature vectors (and la-

bels, in the case of supervised learning models).

In this chapter, we will:

• Introduce the concept of online learning, where models are trained and updated on

new data as it becomes available

• Explore stream processing using Spark Streaming

• See how Spark Streaming fits together with the online learning approach

Search WWH ::

Custom Search

Home