Database Reference
In-Depth Information
val text = sc . textFile ( inputPath )
val lower : RDD [ String ] = text . map ( _ . toLowerCase ())
lower . foreach ( println ( _ ))
The map() method is a transformation, which Spark represents internally as a function
( toLowerCase() ) to be called at some later time on each element in the input RDD
( text ). The function is not actually called until the foreach() method (which is an
action) is invoked and Spark runs a job to read the input file and call toLowerCase()
on each line in it, before writing the result to the console.
One way of telling if an operation is a transformation or an action is by looking at its re-
turn type: if the return type is RDD , then it's a transformation; otherwise, it's an action. It's
useful to know this when looking at the documentation for RDD (in the
org.apache.spark.rdd package), where most of the operations that can be per-
formed on RDDs can be found. More operations can be found in PairRDDFunctions ,
which contains transformations and actions for working with RDDs of key-value pairs.
Spark's library contains a rich set of operators, including transformations for mapping,
grouping, aggregating, repartitioning, sampling, and joining RDDs, and for treating RDDs
as sets. There are also actions for materializing RDDs as collections, computing statistics
on RDDs, sampling a fixed number of elements from an RDD, and saving RDDs to ex-
ternal storage. For details, consult the class documentation.
Search WWH ::




Custom Search