Advance Concepts in Storm - Real-time Analytics with Storm and Cassandra

Database Reference

In-Depth Information

Each of the DRPC query requests are treated as its own mini batch processing job, the two

arguments that do this mini job is a single tuple representing the request. For instance, in

our case, the argument is a list of words separated using a space.

Here are the steps that are being executed in the previous code snippet:

• We split the argument stream into its constituent words; for example, my argu-

ment, storm trident topology , is split into individual words such as

storm , trident , and topology

• Then the incoming stream is grouped by word

• Next, the state-query-operator is used to query the Trident-state-object that was

generated by the first part of the topology:

◦ State query takes in the word counts computed by an earlier section of the

topology.

◦ It then executes the function as specified as part of the DRPC request to

query the data.

◦ In this case, my topology is executing the MapGet function on the query

to get the count of each word; the DRPC stream, in our case, is grouped

in exactly the same manner as the TridentState in the preceding sec-

tion of the topology. This arrangement guarantees that all my word count

queries for each word are directed to the same Trident state partition of

the TridentState object that would manage the updates for the word.

• FilterNull ensures that the words that don't have a count are filtered out

• The sum aggregator then sums all the counts to get the results, which are automat-

ically returned back to the awaiting client

Having understood the execution as per the developer-written code, let's take a look at

what's boilerplate to Trident and what happens automatically behind the scenes when this

framework executes.

• We have two operations in our Trident word count topology that read from or

write to state— persistentAggregate and stateQuery . Trident employs

the capability to batch these operations automatically to that state. So for instance,

the current processing requires 10 reads and writes to the database; Trident would

automatically batch them together as one read and one write. This gets you per-

formance and ease of computation where the optimization is handled by the

framework.

• Trident aggregators are other highly efficient and optimized components of the

framework. They don't work by the rule to transfer all the tuples to one machine

Search WWH ::

Custom Search

Home