Database Reference
In-Depth Information
Function name
Purpose
Example
Result
Return
num
elements from the
RDD.
take(num)
rdd.take(2)
{1, 2}
Return the top
num
elements the RDD.
top(num)
rdd.top(2)
{3, 3}
Return
num
elements based on
provided ordering.
takeOrdered(num)(order
ing)
rdd.takeOrdered(2)
(myOrdering)
{3, 3}
Return
num
elements at
random.
Nondeterministic
takeSample(withReplace
ment, num, [seed])
rdd.takeSample(false, 1)
Combine the
elements of the
RDD together in
parallel (e.g.,
sum
).
9
reduce(func)
rdd.reduce((x, y) => x + y)
Same as
reduce()
but
with the provided
zero value.
9
rdd.fold(0)((x, y) => x + y
)
fold(zero)(func)
Similar to
reduce()
but
used to return a
different type.
aggregate(zeroValue)
(seqOp, combOp)
rdd.aggregate((0, 0))
((x, y) =>
(x._1 + y, x._2 + 1),
(x, y) =>
(x._1 + y._1, x._2 + y._2))
(9, 4)
Apply the provided
function to each
element of the
RDD.
Nothing
foreach(func)
rdd.foreach(func)
Converting Between RDD Types
Some functions are available only on certain types of RDDs, such as
mean()
and
var
iance()
on numeric RDDs or
join()
on key/value pair RDDs. We will cover these
special functions for
numeric data
in
Chapter 6
and pair RDDs in
Chapter 4
. In Scala
and Java, these methods aren't defined on the standard RDD class, so to access this
additional functionality we have to make sure we get the correct specialized class.