Database Reference
In-Depth Information
RDD of integers using an accumulator, while at the same time summing the values in the
RDD using a reduce() action:
val count : Accumulator [ Int ] = sc . accumulator ( 0 )
val result = sc . parallelize ( Array ( 1 , 2 , 3 ))
. map ( i => { count += 1 ; i })
. reduce (( x , y ) => x + y )
assert ( count . value === 3 )
assert ( result === 6 )
An accumulator variable, count , is created in the first line using the accumulator()
method on SparkContext . The map() operation is an identity function with a side ef-
fect that increments count . When the result of the Spark job has been computed, the
value of the accumulator is accessed by calling value on it.
In this example, we used an Int for the accumulator, but any numeric value type can be
used. Spark also provides a way to use accumulators whose result type is different to the
type being added (see the accumulable() method on SparkContext ), and a way to
accumulate values in mutable collections (via accumulableCollection() ).
Search WWH ::




Custom Search