Database Reference
In-Depth Information
positiveExamples = spamFeatures . map ( lambda features : LabeledPoint ( 1 , features ))
negativeExamples = normalFeatures . map ( lambda features : LabeledPoint ( 0 , features ))
trainingData = positiveExamples . union ( negativeExamples )
trainingData . cache () # Cache since Logistic Regression is an iterative algorithm.
# Run Logistic Regression using the SGD algorithm.
model = LogisticRegressionWithSGD . train ( trainingData )
# Test on a positive example (spam) and a negative one (normal). We first apply
# the same HashingTF feature transformation to get vectors, then apply the model.
posTest = tf . transform ( "O M G GET cheap stuff by sending money to ..." . split ( " " ))
negTest = tf . transform ( "Hi Dad, I started studying Spark the other ..." . split ( " " ))
print "Prediction for positive test example: %g " % model . predict ( posTest )
print "Prediction for negative test example: %g " % model . predict ( negTest )
Example 11-2. Spam classifier in Scala
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD
val spam = sc . textFile ( "spam.txt" )
val normal = sc . textFile ( "normal.txt" )
// Create a HashingTF instance to map email text to vectors of 10,000 features.
val tf = new HashingTF ( numFeatures = 10000 )
// Each email is split into words, and each word is mapped to one feature.
val spamFeatures = spam . map ( email => tf . transform ( email . split ( " " )))
val normalFeatures = normal . map ( email => tf . transform ( email . split ( " " )))
// Create LabeledPoint datasets for positive (spam) and negative (normal) examples.
val positiveExamples = spamFeatures . map ( features => LabeledPoint ( 1 , features ))
val negativeExamples = normalFeatures . map ( features => LabeledPoint ( 0 , features ))
val trainingData = positiveExamples . union ( negativeExamples )
trainingData . cache () // Cache since Logistic Regression is an iterative algorithm.
// Run Logistic Regression using the SGD algorithm.
val model = new LogisticRegressionWithSGD (). run ( trainingData )
// Test on a positive example (spam) and a negative one (normal).
val posTest = tf . transform (
"O M G GET cheap stuff by sending money to ..." . split ( " " ))
val negTest = tf . transform (
"Hi Dad, I started studying Spark the other ..." . split ( " " ))
println ( "Prediction for positive test example: " + model . predict ( posTest ))
println ( "Prediction for negative test example: " + model . predict ( negTest ))
Example 11-3. Spam classifier in Java
import org.apache.spark.mllib.classification.LogisticRegressionModel ;
import org.apache.spark.mllib.classification.LogisticRegressionWithSGD ;
Search WWH ::




Custom Search