Database Reference
In-Depth Information
Class
Method name(s)
Description
Converts a PCollection<Pair<K, V>> to a PTable<K,
V> .
PTables
asPTable()
Returns a PTable 's keys as a PCollection .
keys()
Returns a PTable 's values as a PCollection .
values()
Applies a map function to all the keys in a PTable , leaving
the values unchanged.
mapKeys()
Applies a map function to all the values in a PTable or
PGroupedTable , leaving the keys unchanged.
mapValues()
Creates a sample of a PCollection by choosing each ele-
ment independently with a specified probability.
Sample
sample()
reservoirSample() Creates a sample of a PCollection of a specified size, where
each element is equally likely to be included.
Sorts a PTable<K, Pair<V1, V2>> by K then V1 , then ap-
plies a function to give an output PCollection or PTable .
SecondarySort sortAndApply()
Returns a PCollection that is the set difference of two
PCollection s.
Set
difference()
Returns a PCollection that is the set intersection of two
PCollection s.
intersection()
Returns a PCollection of triples that classifies each element
from two PCollection s by whether it is only in the first col-
lection, only in the second collection, or in both collections.
(Similar to the Unix comm command.)
comm()
Creates a PCollection that contains exactly the same ele-
ments as the input PCollection , but is partitioned (sharded)
across a specified number of files.
Shard
shard()
Performs a total sort on a PCollection in the natural order of
its elements in ascending (the default) or descending order.
There are also methods to sort PTable s by key, and collec-
tions of Pair s or tuples by a subset of their columns in a spe-
cified order.
Sort
sort()
One of the most powerful things about Crunch is that if the function you need is not
provided, then it is simple to write it yourself, typically in a few lines of Java. For an ex-
ample of a general-purpose function (for finding the unique values in a PTable ), see
Example 18-2 .
Search WWH ::




Custom Search