Database Reference
In-Depth Information
packaging a simple word count example with both sbt and Maven. All of our exam‐
ples can be built together, but to illustrate a stripped-down build with minimal
dependencies we have a separate smaller project underneath the
learning-spark-
examples/mini-complete-example
directory, as you can see in Examples
2-10
(Java)
and
2-11
(Scala).
Example 2-10. Word count Java application—don't worry about the details yet
// Create a Java Spark Context
SparkConf
conf
=
new
SparkConf
().
setAppName
(
"wordCount"
);
JavaSparkContext
sc
=
new
JavaSparkContext
(
conf
);
// Load our input data.
JavaRDD
<
String
>
input
=
sc
.
textFile
(
inputFile
);
// Split up into words.
JavaRDD
<
String
>
words
=
input
.
flatMap
(
new
FlatMapFunction
<
String
,
String
>()
{
public
Iterable
<
String
>
call
(
String
x
)
{
return
Arrays
.
asList
(
x
.
split
(
" "
));
}});
// Transform into pairs and count.
JavaPairRDD
<
String
,
Integer
>
counts
=
words
.
mapToPair
(
new
PairFunction
<
String
,
String
,
Integer
>(){
public
Tuple2
<
String
,
Integer
>
call
(
String
x
){
return
new
Tuple2
(
x
,
1
);
}}).
reduceByKey
(
new
Function2
<
Integer
,
Integer
,
Integer
>(){
public
Integer
call
(
Integer
x
,
Integer
y
){
return
x
+
y
;}});
// Save the word count back out to a text file, causing evaluation.
counts
.
saveAsTextFile
(
outputFile
);
Example 2-11. Word count Scala application—don't worry about the details yet
// Create a Scala Spark Context.
val
conf
=
new
SparkConf
().
setAppName
(
"wordCount"
)
val
sc
=
new
SparkContext
(
conf
)
// Load our input data.
val
input
=
sc
.
textFile
(
inputFile
)
// Split it up into words.
val
words
=
input
.
flatMap
(
line
=>
line
.
split
(
" "
))
// Transform into pairs and count.
val
counts
=
words
.
map
(
word
=>
(
word
,
1
)).
reduceByKey
{
case
(
x
,
y
)
=>
x
+
y
}
// Save the word count back out to a text file, causing evaluation.
counts
.
saveAsTextFile
(
outputFile
)
We can build these applications using very simple build files with both sbt
(
Example 2-12
) and Maven (
Example 2-13
). We've marked the Spark Core depend‐
ency as
provided
so that, later on, when we use an assembly JAR we don't include the
spark-core
JAR, which is already on the classpath of the workers.