Database Reference
In-Depth Information
packaging a simple word count example with both sbt and Maven. All of our exam‐
ples can be built together, but to illustrate a stripped-down build with minimal
dependencies we have a separate smaller project underneath the learning-spark-
examples/mini-complete-example directory, as you can see in Examples 2-10 (Java)
and 2-11 (Scala).
Example 2-10. Word count Java application—don't worry about the details yet
// Create a Java Spark Context
SparkConf conf = new SparkConf (). setAppName ( "wordCount" );
JavaSparkContext sc = new JavaSparkContext ( conf );
// Load our input data.
JavaRDD < String > input = sc . textFile ( inputFile );
// Split up into words.
JavaRDD < String > words = input . flatMap (
new FlatMapFunction < String , String >() {
public Iterable < String > call ( String x ) {
return Arrays . asList ( x . split ( " " ));
}});
// Transform into pairs and count.
JavaPairRDD < String , Integer > counts = words . mapToPair (
new PairFunction < String , String , Integer >(){
public Tuple2 < String , Integer > call ( String x ){
return new Tuple2 ( x , 1 );
}}). reduceByKey ( new Function2 < Integer , Integer , Integer >(){
public Integer call ( Integer x , Integer y ){ return x + y ;}});
// Save the word count back out to a text file, causing evaluation.
counts . saveAsTextFile ( outputFile );
Example 2-11. Word count Scala application—don't worry about the details yet
// Create a Scala Spark Context.
val conf = new SparkConf (). setAppName ( "wordCount" )
val sc = new SparkContext ( conf )
// Load our input data.
val input = sc . textFile ( inputFile )
// Split it up into words.
val words = input . flatMap ( line => line . split ( " " ))
// Transform into pairs and count.
val counts = words . map ( word => ( word , 1 )). reduceByKey { case ( x , y ) => x + y }
// Save the word count back out to a text file, causing evaluation.
counts . saveAsTextFile ( outputFile )
We can build these applications using very simple build files with both sbt
( Example 2-12 ) and Maven ( Example 2-13 ). We've marked the Spark Core depend‐
ency as provided so that, later on, when we use an assembly JAR we don't include the
spark-core JAR, which is already on the classpath of the workers.
Search WWH ::




Custom Search