Database Reference
In-Depth Information
Scala
In Scala the conversion to RDDs with special functions (e.g., to expose numeric func‐
tions on an RDD[Double] ) is handled automatically using implicit conversions. As
mentioned in “Initializing a SparkContext” on page 17 , we need to add import
org.apache.spark.SparkContext._ for these conversions to work. You can see the
implicit conversions listed in the SparkContext object's ScalaDoc . These implicits
turn an RDD into various wrapper classes, such as DoubleRDDFunctions (for RDDs
of numeric data) and PairRDDFunctions (for key/value pairs), to expose additional
functions such as mean() and variance() .
Implicits, while quite powerful, can sometimes be confusing. If you call a function
like mean() on an RDD, you might look at the Scaladocs for the RDD class and notice
there is no mean() function. The call manages to succeed because of implicit conver‐
sions between RDD[Double] and DoubleRDDFunctions . When searching for functions
on your RDD in Scaladoc, make sure to look at functions that are available in these
wrapper classes.
Java
In Java the conversion between the specialized types of RDDs is a bit more explicit. In
particular, there are special classes called JavaDoubleRDD and JavaPairRDD for RDDs
of these types, with extra methods for these types of data. This has the benefit of giv‐
ing you a greater understanding of what exactly is going on, but can be a bit more
cumbersome.
To construct RDDs of these special types, instead of always using the Function class
we will need to use specialized versions. If we want to create a DoubleRDD from an
RDD of type T , rather than using Function<T, Double> we use DoubleFunction<T> .
Table 3-5 shows the specialized functions and their uses.
We also need to call different functions on our RDD (so we can't just create a Double
Function and pass it to map() ). When we want a DoubleRDD back, instead of calling
map() , we need to call mapToDouble() with the same pattern all of the other functions
follow.
Table 3-5. Java interfaces for type-specific functions
Function name
Equivalent function*<A, B,…>
Usage
DoubleRDD from a
flatMapToDouble
DoubleFlatMapFunction<T>
Function<T, Iterable<Double>>
DoubleRDD from map
ToDouble
DoubleFunction<T>
Function<T, double>
Search WWH ::




Custom Search