Database Reference
In-Depth Information
Since we often want our RDDs in the reverse order, the sortByKey() function takes a
parameter called ascending indicating whether we want it in ascending order (it
defaults to true ). Sometimes we want a different sort order entirely, and to support
this we can provide our own comparison function. In Examples 4-19 through 4-21 ,
we will sort our RDD by converting the integers to strings and using the string com‐
parison functions.
Example 4-19. Custom sort order in Python, sorting integers as if strings
rdd . sortByKey ( ascending = True , numPartitions = None , keyfunc = lambda x : str ( x ))
Example 4-20. Custom sort order in Scala, sorting integers as if strings
val input : RDD [( Int , Venue )] = ...
implicit val sortIntegersByString = new Ordering [ Int ] {
override def compare ( a : Int , b : Int ) = a . toString . compare ( b . toString )
}
rdd . sortByKey ()
Example 4-21. Custom sort order in Java, sorting integers as if strings
class IntegerComparator implements Comparator < Integer > {
public int compare ( Integer a , Integer b ) {
return String . valueOf ( a ). compareTo ( String . valueOf ( b ))
}
}
rdd . sortByKey ( comp )
Actions Available on Pair RDDs
As with the transformations, all of the traditional actions available on the base RDD
are also available on pair RDDs. Some additional actions are available on pair RDDs
to take advantage of the key/value nature of the data; these are listed in Table 4-3 .
Table 4-3. Actions on pair RDDs (example ({(1, 2), (3, 4), (3, 6)}))
Function
Description
Example
Result
Count the number of elements for each
key.
countByKey()
rdd.countByKey()
{(1, 1), (3, 2)}
collectAsMap() Collect the result as a map to provide easy
lookup.
rdd.collectAsMap() Map{(1, 2), (3,
4), (3, 6)}
Return all values associated with the
provided key.
lookup(key)
rdd.lookup(3)
[4, 6]
 
Search WWH ::




Custom Search