Database Reference
In-Depth Information
Let's look at a basic example of map() that squares all of the numbers in an RDD
(Examples 3-26 through 3-28 ).
Example 3-26. Python squaring the values in an RDD
nums = sc . parallelize ([ 1 , 2 , 3 , 4 ])
squared = nums . map ( lambda x : x * x ) . collect ()
for num in squared :
print " %i " % ( num )
Example 3-27. Scala squaring the values in an RDD
val input = sc . parallelize ( List ( 1 , 2 , 3 , 4 ))
val result = input . map ( x => x * x )
println ( result . collect (). mkString ( "," ))
Example 3-28. Java squaring the values in an RDD
JavaRDD < Integer > rdd = sc . parallelize ( Arrays . asList ( 1 , 2 , 3 , 4 ));
JavaRDD < Integer > result = rdd . map ( new Function < Integer , Integer >() {
public Integer call ( Integer x ) { return x * x ; }
});
System . out . println ( StringUtils . join ( result . collect (), "," ));
Sometimes we want to produce multiple output elements for each input element. The
operation to do this is called flatMap() . As with map() , the function we provide to
flatMap() is called individually for each element in our input RDD. Instead of
returning a single element, we return an iterator with our return values. Rather than
producing an RDD of iterators, we get back an RDD that consists of the elements
from all of the iterators. A simple usage of flatMap() is splitting up an input string
into words, as shown in Examples 3-29 through 3-31 .
Example 3-29. flatMap() in Python, splitting lines into words
lines = sc . parallelize ([ "hello world" , "hi" ])
words = lines . flatMap ( lambda line : line . split ( " " ))
words . first () # returns "hello"
Example 3-30. flatMap() in Scala, splitting lines into multiple words
val lines = sc . parallelize ( List ( "hello world" , "hi" ))
val words = lines . flatMap ( line => line . split ( " " ))
words . first () // returns "hello"
Search WWH ::




Custom Search