Database Reference
In-Depth Information
batch to a single partition. Different batches may be sent to a different
partition.
• There is also a
partition
function that takes an implementation of
CustomStreamGrouping
to implement custom partition methods.
In addition to these partition methods, Trident also implements a special
type of partition operation called
groupBy
. This partition operation acts
very much like the reducer phase in a Map-Reduce job and is probably the
most commonly used partition in Trident.
The
groupBy
operation first applies a
partitionBy
partition according to
the fields specified in the
groupBy
operation. Within each partition, values
with identical hash values are then grouped together for further processing
in a
GroupedStream
. Aggregators can then be applied directly to these
groups.
Aggregation
Trident has two methods of aggregation on streams,
aggregate
and
persistent-Aggregate
. The
aggregate
method applied to a stream
with a function that implements
ReducerAggregator
or
Aggregator
effectively performs a
global
partition on the data, causing all tuples to be
sent tothesame partition foraggregation onaper-batch basis. An aggregate
method that is given a
CombinerAggregator
first computes an
intermediate aggregate for each partition and then performs a
global
partition on the output of these intermediate aggregates.
Thebuilt-inaggregators
Count
and
Sum
areboth
CombinerAggregators
.
Other custom aggregation methods can be implemented by extending the
BaseAggregator
class. An example of this is shown in the next section
when a partition local aggregator is implemented.
The other aggregation method,
persistentAggregate
, works like an
aggregate except that it records its output into a
State
object. These
State
objects often work with the
Spout
in a Trident topology to ensure features
like transactional semantics.
The
persistentAggregate
method returns a
TridentState
object
rather than a
Stream
object. This object has a method
newValuesStream
that emits a tuple every time a key's state changes. This is useful for