Database Reference
In-Depth Information
Vectors.dense(features))
}
The final step is to tell the model to train and test on our transformed DStream and also to
print out the first few elements of each batch in the DStream of predicted values:
// train and test model on the stream, and print
predictions
// for illustrative purposes
model.trainOn(labeledStream)
model.predictOn(labeledStream).print()
ssc.start()
ssc.awaitTermination()
}
}
Tip
Note that because we are using the same MLlib model classes for streaming as we did for
batch processing, we can, if we choose, perform multiple iterations over the training data
in each batch (which is just an RDD of LabeledPoint instances).
Here, we will set the number of iterations to 1 to simulate purely online learning. In prac-
tice, you can set the number of iterations higher, but note that the training time per batch
will go up. If the training time per batch is much higher than the batch interval, the
streaming model will start to lag behind the velocity of the data stream.
This can be handled by decreasing the number of iterations, increasing the batch interval,
or increasing the parallelism of our streaming program by adding more Spark workers.
Now, we're ready to run SimpleStreamingModel in our second terminal window us-
ing sbt run in the same way as we did for the producer (remember to select the correct
main method for SBT to execute). Once the streaming program starts running, you should
see the following output in the producer console:
Got client connected from: /127.0.0.1
...
Created 10 events...
Search WWH ::




Custom Search