Database Reference
In-Depth Information
Table 3-2. Basic RDD transformations on an RDD containing {1, 2, 3, 3}
Function name
Purpose
Example
Result
Apply a function to each
element in the RDD and return
an RDD of the result.
map()
rdd.map(x => x + 1)
{2, 3, 4, 4}
Apply a function to each
element in the RDD and return
an RDD of the contents of the
iterators returned. Often used to
extract words.
flatMap()
rdd.flatMap(x => x.to(3)) {1, 2, 3, 2,
3, 3, 3}
Return an RDD consisting of only
elements that pass the condition
passed to filter() .
filter()
rdd.filter(x => x != 1)
{2, 3, 3}
Remove duplicates.
distinct()
rdd.distinct()
{1, 2, 3}
Sample an RDD, with or without
replacement.
Nondeterministic
sample(withRe
placement, frac
tion, [seed])
rdd.sample(false, 0.5)
Table 3-3. Two-RDD transformations on RDDs containing {1, 2, 3} and {3, 4, 5}
Function name
Purpose
Example
Result
Produce an RDD containing elements
from both RDDs.
union()
rdd.union(other)
{1, 2, 3, 3,
4, 5}
RDD containing only elements found in
both RDDs.
intersec
tion()
rdd.intersection(other) {3}
Remove the contents of one RDD (e.g.,
remove training data).
subtract()
rdd.subtract(other)
{1, 2}
Cartesian product with the other RDD.
cartesian()
rdd.cartesian(other)
{(1, 3), (1,
4), … (3,5)}
Actions
The most common action on basic RDDs you will likely use is reduce() , which takes
a function that operates on two elements of the type in your RDD and returns a new
element of the same type. A simple example of such a function is +, which we can use
to sum our RDD. With reduce() , we can easily sum the elements of our RDD, count
the number of elements, and perform other types of aggregations (see Examples 3-32
through 3-34 ).
 
Search WWH ::




Custom Search