Advance Concepts in Storm - Real-time Analytics with Storm and Cassandra

Database Reference

In-Depth Information

This preceding code snippet ensures that the spout myFixedspout cycles over the set

of sentences added as values. This snippet ensures that we have an endless flow of data

streams into the topology and enough points to perform all micro-batching functions that

we intend to.

Now we have made sure about continuous input stream let's look at the following snippet:

//creating a new trident topology

TridentTopology myTridentTopology = new TridentTopology();

//Adding a spout and configuring the fields and query

TridentState myWordCounts =

topology.newStream("myFixedspout", spout)

.each(new Fields("sentence"), new Split(), new

Fields("word"))

.groupBy(new Fields("word"))

.persistentAggregate(new MemoryMapState.Factory(), new

Count(), new Fields("count"))

.parallelismHint(6);

Now let's look at the code line by line to interpret how it works.

Here we start with creating a Trident topology object, which in turn gets the developer ac-

cess to the Trident interfaces.

This topology, myTridentTopology , has access to a method called newStream that

enables it to create a new stream to read the data from the source.

Here we use myFixedSpout from the preceding snippet that would cycle through a pre-

defined set of sentences. In a production scenario or a real-life scenario, we will use a

spout to read the streams off a queue (such as RabbitMQ, Kafka, and so on).

Now the micro-batching; who does it and how? Well the Trident framework stores the

state for each source (it kind of remembers what input data it has consumed so far). This

state saving is done in the Zookeeper cluster. The tagging spout in the preceding code is

actually a znode, which is created in the Zookeeper cluster to save the state metadata in-

formation.

This metadata information is stored for small batches wherein the batch size is a variant

based on the speed of incoming tuples; it could be few hundred to millions of tuples based

on the event transactions per second ( tps ).

Search WWH ::

Custom Search

Home