Database Reference
In-Depth Information
Function name
Purpose
Example
Result
Return num
elements from the
RDD.
take(num)
rdd.take(2)
{1, 2}
Return the top num
elements the RDD.
top(num)
rdd.top(2)
{3, 3}
Return num
elements based on
provided ordering.
takeOrdered(num)(order
ing)
rdd.takeOrdered(2)
(myOrdering)
{3, 3}
Return num
elements at
random.
Nondeterministic
takeSample(withReplace
ment, num, [seed])
rdd.takeSample(false, 1)
Combine the
elements of the
RDD together in
parallel (e.g., sum ).
9
reduce(func)
rdd.reduce((x, y) => x + y)
Same as
reduce() but
with the provided
zero value.
9
rdd.fold(0)((x, y) => x + y )
fold(zero)(func)
Similar to
reduce() but
used to return a
different type.
aggregate(zeroValue)
(seqOp, combOp)
rdd.aggregate((0, 0))
((x, y) =>
(x._1 + y, x._2 + 1),
(x, y) =>
(x._1 + y._1, x._2 + y._2))
(9, 4)
Apply the provided
function to each
element of the
RDD.
Nothing
foreach(func)
rdd.foreach(func)
Converting Between RDD Types
Some functions are available only on certain types of RDDs, such as mean() and var
iance() on numeric RDDs or join() on key/value pair RDDs. We will cover these
special functions for numeric data in Chapter 6 and pair RDDs in Chapter 4 . In Scala
and Java, these methods aren't defined on the standard RDD class, so to access this
additional functionality we have to make sure we get the correct specialized class.
 
Search WWH ::




Custom Search