Database Reference
In-Depth Information
Spark operations are functional in style. For programmers familiar with functional pro-
gramming in Scala or Python, these operations should seem natural. For those without ex-
perience in functional programming, don't worry; the Spark API is relatively easy to learn.
One of the most common transformations that you will use in Spark programs is the map
operator. This applies a function to each record of an RDD, thus mapping the input to
some new output. For example, the following code fragment takes the RDD we created
from a local text file and applies the size function to each record in the RDD. Remem-
ber that we created an RDD of Strings . Using map , we can transform each string to an
integer, thus returning an RDD of Ints :
val intsFromStringsRDD = rddFromTextFile.map( line =>
line.size )
You should see output similar to the following line in your shell; this indicates the type of
the RDD:
intsFromStringsRDD: org.apache.spark.rdd.RDD[Int] =
MappedRDD[5] at map at <console>:14
In the preceding code, we saw the => syntax used. This is the Scala syntax for an an-
onymous function, which is a function that is not a named method (that is, one defined us-
ing the def keyword in Scala or Python, for example).
Note
While a detailed treatment of anonymous functions is beyond the scope of this topic, they
are used extensively in Spark code in Scala and Python, as well as in Java 8 (both in ex-
amples and real-world applications), so it is useful to cover a few practicalities.
The line => line.size syntax means that we are applying a function where the in-
put variable is to the left of the => operator, and the output is the result of the code to the
right of the => operator. In this case, the input is line , and the output is the result of call-
ing line.size . In Scala, this function that maps a string to an integer is expressed as
String => Int .
This syntax saves us from having to separately define functions every time we use meth-
ods such as map ; this is useful when the function is simple and will only be used once, as
in this example.
Search WWH ::




Custom Search