Database Reference
In-Depth Information
Table 18-1. Crunch libraries
Class
Method name(s)
Description
Returns the number of elements in a PCollection wrapped
in a PObject .
Aggregate
length()
Returns the smallest value element in a PCollection
wrapped in a PObject .
min()
Returns the largest value element in a PCollection wrapped
in a PObject .
max()
Returns a table of the unique elements of the input PCollec-
tion mapped to their counts.
count()
Returns a table of the top or bottom N key-value pairs in a PT-
able , ordered by value.
top()
collectValues() Groups the values for each unique key in a table into a Java
Collection , returning a PTable<K, Collection<V>> .
Calculates the cross product of two PCollection s or PT-
able s.
Cartesian
cross()
Splits a collection of pairs ( PCollection<Pair<T, U>> ) in-
to a pair of collections ( Pair<PCollection<T>, PCollec-
tion<U>> ).
Channels
split()
Groups the elements in two or more PTable s by key.
Cogroup
cogroup()
Creates a new PCollection or PTable with duplicate ele-
ments removed.
Distinct
distinct()
Performs an inner join on two PTable s by key. There are also
methods for left, right, and full joins.
Join
join()
Runs a mapper (old API) on a PTable<K1, V1> to produce a
PTable<K2, V2> .
Mapred
map()
Runs a reducer (old API) on a PGroupedTable<K1, V1> to
produce a PTable<K2, V2> .
reduce()
map() , reduce()
Like Mapred , but for the new MapReduce API.
Mapreduce
Search WWH ::




Custom Search