Database Reference
In-Depth Information
Let's look at a basic example of
map()
that squares all of the numbers in an RDD
(Examples
3-26
through
3-28
).
Example 3-26. Python squaring the values in an RDD
nums
=
sc
.
parallelize
([
1
,
2
,
3
,
4
])
squared
=
nums
.
map
(
lambda
x
:
x
*
x
)
.
collect
()
for
num
in
squared
:
print
"
%i
"
%
(
num
)
Example 3-27. Scala squaring the values in an RDD
val
input
=
sc
.
parallelize
(
List
(
1
,
2
,
3
,
4
))
val
result
=
input
.
map
(
x
=>
x
*
x
)
println
(
result
.
collect
().
mkString
(
","
))
Example 3-28. Java squaring the values in an RDD
JavaRDD
<
Integer
>
rdd
=
sc
.
parallelize
(
Arrays
.
asList
(
1
,
2
,
3
,
4
));
JavaRDD
<
Integer
>
result
=
rdd
.
map
(
new
Function
<
Integer
,
Integer
>()
{
public
Integer
call
(
Integer
x
)
{
return
x
*
x
;
}
});
System
.
out
.
println
(
StringUtils
.
join
(
result
.
collect
(),
","
));
Sometimes we want to produce multiple output elements for each input element. The
operation to do this is called
flatMap()
. As with
map()
, the function we provide to
flatMap()
is called individually for each element in our input RDD. Instead of
returning a single element, we return an iterator with our return values. Rather than
producing an RDD of iterators, we get back an RDD that consists of the elements
from all of the iterators. A simple usage of
flatMap()
is splitting up an input string
into words, as shown in Examples
3-29
through
3-31
.
Example 3-29. flatMap() in Python, splitting lines into words
lines
=
sc
.
parallelize
([
"hello world"
,
"hi"
])
words
=
lines
.
flatMap
(
lambda
line
:
line
.
split
(
" "
))
words
.
first
()
# returns "hello"
Example 3-30. flatMap() in Scala, splitting lines into multiple words
val
lines
=
sc
.
parallelize
(
List
(
"hello world"
,
"hi"
))
val
words
=
lines
.
flatMap
(
line
=>
line
.
split
(
" "
))
words
.
first
()
// returns "hello"