Database Reference
In-Depth Information
This preceding code snippet ensures that the spout myFixedspout cycles over the set
of sentences added as values. This snippet ensures that we have an endless flow of data
streams into the topology and enough points to perform all micro-batching functions that
we intend to.
Now we have made sure about continuous input stream let's look at the following snippet:
//creating a new trident topology
TridentTopology myTridentTopology = new TridentTopology();
//Adding a spout and configuring the fields and query
TridentState myWordCounts =
topology.newStream("myFixedspout", spout)
.each(new Fields("sentence"), new Split(), new
Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(), new
Count(), new Fields("count"))
.parallelismHint(6);
Now let's look at the code line by line to interpret how it works.
Here we start with creating a Trident topology object, which in turn gets the developer ac-
cess to the Trident interfaces.
This topology, myTridentTopology , has access to a method called newStream that
enables it to create a new stream to read the data from the source.
Here we use myFixedSpout from the preceding snippet that would cycle through a pre-
defined set of sentences. In a production scenario or a real-life scenario, we will use a
spout to read the streams off a queue (such as RabbitMQ, Kafka, and so on).
Now the micro-batching; who does it and how? Well the Trident framework stores the
state for each source (it kind of remembers what input data it has consumed so far). This
state saving is done in the Zookeeper cluster. The tagging spout in the preceding code is
actually a znode, which is created in the Zookeeper cluster to save the state metadata in-
formation.
This metadata information is stored for small batches wherein the batch size is a variant
based on the speed of incoming tuples; it could be few hundred to millions of tuples based
on the event transactions per second ( tps ).
Search WWH ::




Custom Search