Database Reference
In-Depth Information
Merge and join
The merges and joins APIs provide interfaces for merging and joining various streams to-
gether. This is possible using a variety of ways provided as follows:
•
Merge
: As the name suggests,
merge
merges two or more streams together and
emits the merged stream as the output field of the first stream:
myTridentTopology.merge(stream1,stream2,stream3);
•
Join
: This operation works as the traditional SQL
join
function, but with the
difference that it applies to small batches instead of entire infinite streams coming
out of the spout
For example, consider a join function where Stream 1 has fields such as
["key",
"val1", "val2"]
and Stream 2 has
["x", "val1"]
, and from these functions we
execute the following code:
myTridentTopology.join(stream1, new Fields("key"), stream2,
new Fields("x"), new Fields("key", "a", "b", "c"));
As a result, Stream 1 and Stream 2 would be joined using
key
and
x
, wherein
key
would
join the field for Stream 1 and
x
would join the field for Stream 2.
The output tuples emitted from the join would have the following:
• The list of all the join fields; in our case, it would be
key
from Stream 1 and
x
from Stream 2.
• A list of all the fields that are not join fields from all the streams involved in the
join operation in the same order as they are passed to the
join
operation. In our
case, it's
a
and
b
respectively for
val1
and
val2
of Stream 1, and
c
for
val1
from Stream 2 (note that this step also removes the ambiguity of field names if any
ambiguity is present within the stream, in our case
val1
field was ambiguous
between both the streams).
When operations like join happen on streams that are being fed in the topology from differ-
ent spouts, the framework ensures that the spouts are synchronized with respect to batch
emission, so that every join computation can include tuples from a batch of each spout.
Search WWH ::
Custom Search