Real-time Machine Learning with Spark Streaming - Machine Learning with Spark

Database Reference

In-Depth Information

Comparing model performance with Spark

Streaming

As we have used a known weight vector and intercept to generate the training data in our

producer application, we would expect our model to eventually learn this underlying

weight vector (in the absence of random noise, which we do not add for this example).

Therefore, we should see the model's error rate decrease over time, as it sees more and

more data. We can also use standard regression error metrics to compare the performance

of multiple models.

In this example, we will create two models with different learning rates, training them both

on the same data stream. We will then make predictions for each model and measure the

mean-squared error ( MSE ) and root mean-squared error ( RMSE ) metrics for each

batch.

Our new monitored streaming model code is shown here:

/**

* A streaming regression model that compares the model

performance of two models, printing out metrics for

* each batch

*/

object MonitoringStreamingModel {

import org.apache.spark.SparkContext._

def main(args: Array[String]) {

val ssc = new StreamingContext("local[2]", "First

Streaming App", Seconds(10))

val stream = ssc.socketTextStream("localhost", 9999)

val NumFeatures = 100

val zeroVector = DenseVector.zeros[Double](NumFeatures)

val model1 = new StreamingLinearRegressionWithSGD()

.setInitialWeights(Vectors.dense(zeroVector.data))

.setNumIterations(1)

.setStepSize(0.01)

Search WWH ::

Custom Search

Home