Spark - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

RDD of integers using an accumulator, while at the same time summing the values in the

RDD using a reduce() action:

val count : Accumulator [ Int ] = sc . accumulator ( 0 )

val result = sc . parallelize ( Array ( 1 , 2 , 3 ))

. map ( i => { count += 1 ; i })

. reduce (( x , y ) => x + y )

assert ( count . value === 3 )

assert ( result === 6 )

An accumulator variable, count , is created in the first line using the accumulator()

method on SparkContext . The map() operation is an identity function with a side ef-

fect that increments count . When the result of the Spark job has been computed, the

value of the accumulator is accessed by calling value on it.

In this example, we used an Int for the accumulator, but any numeric value type can be

used. Spark also provides a way to use accumulators whose result type is different to the

type being added (see the accumulable() method on SparkContext ), and a way to

accumulate values in mutable collections (via accumulableCollection() ).

Search WWH ::

Custom Search

Home