Database Reference
In-Depth Information
Comparing model performance with Spark
Streaming
As we have used a known weight vector and intercept to generate the training data in our
producer application, we would expect our model to eventually learn this underlying
weight vector (in the absence of random noise, which we do not add for this example).
Therefore, we should see the model's error rate decrease over time, as it sees more and
more data. We can also use standard regression error metrics to compare the performance
of multiple models.
In this example, we will create two models with different learning rates, training them both
on the same data stream. We will then make predictions for each model and measure the
mean-squared error ( MSE ) and root mean-squared error ( RMSE ) metrics for each
batch.
Our new monitored streaming model code is shown here:
/**
* A streaming regression model that compares the model
performance of two models, printing out metrics for
* each batch
*/
object MonitoringStreamingModel {
import org.apache.spark.SparkContext._
def main(args: Array[String]) {
val ssc = new StreamingContext("local[2]", "First
Streaming App", Seconds(10))
val stream = ssc.socketTextStream("localhost", 9999)
val NumFeatures = 100
val zeroVector = DenseVector.zeros[Double](NumFeatures)
val model1 = new StreamingLinearRegressionWithSGD()
.setInitialWeights(Vectors.dense(zeroVector.data))
.setNumIterations(1)
.setStepSize(0.01)
Search WWH ::




Custom Search