Database Reference
In-Depth Information
whether a given function is a transformation or an action, you can look at its return
type: transformations return RDDs, whereas actions return some other data type.
Transformations
Transformations are operations on RDDs that return a new RDD. As discussed in
“Lazy Evaluation” on page 29 , transformed RDDs are computed lazily, only when you
use them in an action. Many transformations are element-wise ; that is, they work on
one element at a time; but this is not true for all transformations.
As an example, suppose that we have a logfile, log.txt , with a number of messages,
and we want to select only the error messages. We can use the filter() transforma‐
tion seen before. This time, though, we'll show a filter in all three of Spark's language
APIs (Examples 3-11 through 3-13 ).
Example 3-11. filter() transformation in Python
inputRDD = sc . textFile ( "log.txt" )
errorsRDD = inputRDD . filter ( lambda x : "error" in x )
Example 3-12. filter() transformation in Scala
val inputRDD = sc . textFile ( "log.txt" )
val errorsRDD = inputRDD . filter ( line => line . contains ( "error" ))
Example 3-13. filter() transformation in Java
JavaRDD < String > inputRDD = sc . textFile ( "log.txt" );
JavaRDD < String > errorsRDD = inputRDD . filter (
new Function < String , Boolean >() {
public Boolean call ( String x ) { return x . contains ( "error" ); }
}
});
Note that the filter() operation does not mutate the existing inputRDD . Instead, it
returns a pointer to an entirely new RDD. inputRDD can still be reused later in the
program—for instance, to search for other words. In fact, let's use inputRDD again to
search for lines with the word warning in them. Then, we'll use another transforma‐
tion, union() , to print out the number of lines that contained either error or warning .
We show Python in Example 3-14 , but the union() function is identical in all three
languages.
Search WWH ::




Custom Search