Database Reference
In-Depth Information
Operations related to stream repartitioning
As the name suggests, these stream repartitioning operations are related to the execution of
functions to change the tuple partitions across the tasks. These operations involve network
traffic and the results redistribute the stream, and can result in changes to an overall parti-
tioning strategy thus impacting a number of partitions.
Here are the repartitioning functions provided by the Trident API:
Shuffle : This executes a rebalance kind of functionality and it employs a ran-
dom round robin algorithm for an even redistribution of tuples across the parti-
tions.
Broadcast : This does what the name suggests; it broadcasts and transmits each
tuple to every target partition.
partitionBy : This function works on hashing and mod on a set of specified
fields so that the same fields are always moved to the same partitions. As an ana-
logy, one can assume that the functioning of this is similar to the fields grouping
that we learned about initially in Storm groupings.
global : This is identical to the global grouping of streams in a Storm, and in this
case, the same partition is chosen for all the batches.
batchGlobal : All tuples in a batch are sent to the same partition (so they kind
of stick together), but different batches can be delivered to different partitions.
Search WWH ::




Custom Search