Database Reference
In-Depth Information
Example 3-31. flatMap() in Java, splitting lines into multiple words
JavaRDD
<
String
>
lines
=
sc
.
parallelize
(
Arrays
.
asList
(
"hello world"
,
"hi"
));
JavaRDD
<
String
>
words
=
lines
.
flatMap
(
new
FlatMapFunction
<
String
,
String
>()
{
public
Iterable
<
String
>
call
(
String
line
)
{
return
Arrays
.
asList
(
line
.
split
(
" "
));
}
});
words
.
first
();
// returns "hello"
We illustrate the difference between
flatMap()
and
map()
in
Figure 3-3
. You can
think of
flatMap()
as “flattening” the iterators returned to it, so that instead of end‐
ing up with an RDD of lists we have an RDD of the elements in those lists.
Figure 3-3. Difference between flatMap() and map() on an RDD
Pseudo set operations
RDDs support many of the operations of mathematical sets, such as union and inter‐
section, even when the RDDs themselves are not properly sets. Four operations are
shown in
Figure 3-4
. It's important to note that all of these operations require that
the RDDs being operated on are of the same type.
Figure 3-4. Some simple set operations