Database Reference
In-Depth Information
Merge and join
The merges and joins APIs provide interfaces for merging and joining various streams to-
gether. This is possible using a variety of ways provided as follows:
Merge : As the name suggests, merge merges two or more streams together and
emits the merged stream as the output field of the first stream:
myTridentTopology.merge(stream1,stream2,stream3);
Join : This operation works as the traditional SQL join function, but with the
difference that it applies to small batches instead of entire infinite streams coming
out of the spout
For example, consider a join function where Stream 1 has fields such as ["key",
"val1", "val2"] and Stream 2 has ["x", "val1"] , and from these functions we
execute the following code:
myTridentTopology.join(stream1, new Fields("key"), stream2,
new Fields("x"), new Fields("key", "a", "b", "c"));
As a result, Stream 1 and Stream 2 would be joined using key and x , wherein key would
join the field for Stream 1 and x would join the field for Stream 2.
The output tuples emitted from the join would have the following:
• The list of all the join fields; in our case, it would be key from Stream 1 and x
from Stream 2.
• A list of all the fields that are not join fields from all the streams involved in the
join operation in the same order as they are passed to the join operation. In our
case, it's a and b respectively for val1 and val2 of Stream 1, and c for val1
from Stream 2 (note that this step also removes the ambiguity of field names if any
ambiguity is present within the stream, in our case val1 field was ambiguous
between both the streams).
When operations like join happen on streams that are being fed in the topology from differ-
ent spouts, the framework ensures that the spouts are synchronized with respect to batch
emission, so that every join computation can include tuples from a batch of each spout.
Search WWH ::




Custom Search