Database Reference
In-Depth Information
A simple streaming regression program
To illustrate the use of streaming regression, we will create a simple example similar to the
preceding one, which uses simulated data. We will write a producer program that generates
random feature vectors and target variables, given a fixed, known weight vector, and writes
each training example to a network stream.
Our consumer application will run a streaming regression model, training and then testing
on our simulated data stream. Our first example consumer will simply print its predictions
to the console.
Creating a streaming data producer
The data producer operates in a manner similar to our product event producer example.
Recall from Chapter 5 , Building a Classification Model with Spark , that a linear model is a
linear combination (or vector dot product) of a weight vector, w , and a feature vector, x
(that is, wTx ). Our producer will generate synthetic data using a fixed, known weight vector
and randomly generated feature vectors. This data fits the linear model formulation exactly,
so we will expect our regression model to learn the true weight vector fairly easily.
First, we will set up a maximum number of events per second (say, 100) and the number of
features in our feature vector (also 100 in this example):
/**
* A producer application that generates random linear
regression data.
*/
object StreamingModelProducer {
import breeze.linalg._
def main(args: Array[String]) {
// Maximum number of events per second
val MaxEvents = 100
val NumFeatures = 100
val random = new Random()
Search WWH ::




Custom Search