Database Reference
In-Depth Information
The first step to a Spark program in Java
The Java API is very similar in principle to the Scala API. However, while Scala can call
the Java code quite easily, in some cases, it is not possible to call the Scala code from Java.
This is particularly the case when such Scala code makes use of certain Scala features such
as implicit conversions, default parameters, and the Scala reflection API.
Spark makes heavy use of these features in general, so it is necessary to have a separate
API specifically for Java that includes Java versions of the common classes. Hence,
SparkContext becomes JavaSparkContext , and RDD becomes JavaRDD .
Java versions prior to version 8 do not support anonymous functions and do not have suc-
cinct syntax for functional-style programming, so functions in the Spark Java API must im-
plement a WrappedFunction interface with the call method signature. While it is sig-
nificantly more verbose, we will often create one-off anonymous classes to pass to our
Spark operations, which implement this interface and the call method, to achieve much
the same effect as anonymous functions in Scala.
Spark provides support for Java 8's anonymous function (or lambda ) syntax. Using this
syntax makes a Spark program written in Java 8 look very close to the equivalent Scala
program.
In Scala, an RDD of key/value pairs provides special operators (such as reduceByKey
and saveAsSequenceFile , for example) that are accessed automatically via implicit
conversions. In Java, special types of JavaRDD classes are required in order to access sim-
ilar functions. These include JavaPairRDD to work with key/value pairs and
JavaDoubleRDD to work with numerical records.
Tip
In this section, we covered the standard Java API syntax. For more details and examples re-
lated to working RDDs in Java as well as the Java 8 lambda syntax, see the Java sections of
the Spark Programming Guide found at http://spark.apache.org/docs/latest/programming-
guide.html#rdd-operations .
We will see examples of most of these differences in the following Java program, which is
included in the example code of this chapter in the directory named java-spark-app .
The code directory also contains the CSV data file under the data subdirectory.
Search WWH ::




Custom Search