Database Reference
In-Depth Information
Table 18-1. Crunch libraries
Class
Method name(s)
Description
Returns the number of elements in a
PCollection
wrapped
in a
PObject
.
Aggregate
length()
Returns the smallest value element in a
PCollection
wrapped in a
PObject
.
min()
Returns the largest value element in a
PCollection
wrapped
in a
PObject
.
max()
Returns a table of the unique elements of the input
PCollec-
tion
mapped to their counts.
count()
Returns a table of the top or bottom N key-value pairs in a
PT-
able
, ordered by value.
top()
collectValues()
Groups the values for each unique key in a table into a Java
Collection
, returning a
PTable<K, Collection<V>>
.
Calculates the cross product of two
PCollection
s or
PT-
able
s.
Cartesian
cross()
Splits a collection of pairs (
PCollection<Pair<T, U>>
) in-
to a pair of collections (
Pair<PCollection<T>, PCollec-
tion<U>>
).
Channels
split()
Groups the elements in two or more
PTable
s by key.
Cogroup
cogroup()
Creates a new
PCollection
or
PTable
with duplicate ele-
ments removed.
Distinct
distinct()
Performs an inner join on two
PTable
s by key. There are also
methods for left, right, and full joins.
Join
join()
Runs a mapper (old API) on a
PTable<K1, V1>
to produce a
PTable<K2, V2>
.
Mapred
map()
Runs a reducer (old API) on a
PGroupedTable<K1, V1>
to
produce a
PTable<K2, V2>
.
reduce()
map()
,
reduce()
Like
Mapred
, but for the new MapReduce API.
Mapreduce