Database Reference
In-Depth Information
RDD of integers using an accumulator, while at the same time summing the values in the
RDD using a
reduce()
action:
val
count
:
Accumulator
[
Int
]
=
sc
.
accumulator
(
0
)
val
result
=
sc
.
parallelize
(
Array
(
1
,
2
,
3
))
.
map
(
i
=>
{
count
+=
1
;
i
})
.
reduce
((
x
,
y
)
=>
x
+
y
)
assert
(
count
.
value
===
3
)
assert
(
result
===
6
)
An accumulator variable,
count
, is created in the first line using the
accumulator()
method on
SparkContext
. The
map()
operation is an identity function with a side ef-
fect that increments
count
. When the result of the Spark job has been computed, the
value of the accumulator is accessed by calling
value
on it.
In this example, we used an
Int
for the accumulator, but any numeric value type can be
used. Spark also provides a way to use accumulators whose result type is different to the
type being added (see the
accumulable()
method on
SparkContext
), and a way to
accumulate values in mutable collections (via
accumulableCollection()
).