Database Reference
In-Depth Information
Each of the DRPC query requests are treated as its own mini batch processing job, the two
arguments that do this mini job is a single tuple representing the request. For instance, in
our case, the argument is a list of words separated using a space.
Here are the steps that are being executed in the previous code snippet:
• We split the argument stream into its constituent words; for example, my argu-
ment, storm trident topology , is split into individual words such as
storm , trident , and topology
• Then the incoming stream is grouped by word
• Next, the state-query-operator is used to query the Trident-state-object that was
generated by the first part of the topology:
◦ State query takes in the word counts computed by an earlier section of the
topology.
◦ It then executes the function as specified as part of the DRPC request to
query the data.
◦ In this case, my topology is executing the MapGet function on the query
to get the count of each word; the DRPC stream, in our case, is grouped
in exactly the same manner as the TridentState in the preceding sec-
tion of the topology. This arrangement guarantees that all my word count
queries for each word are directed to the same Trident state partition of
the TridentState object that would manage the updates for the word.
FilterNull ensures that the words that don't have a count are filtered out
• The sum aggregator then sums all the counts to get the results, which are automat-
ically returned back to the awaiting client
Having understood the execution as per the developer-written code, let's take a look at
what's boilerplate to Trident and what happens automatically behind the scenes when this
framework executes.
• We have two operations in our Trident word count topology that read from or
write to state— persistentAggregate and stateQuery . Trident employs
the capability to batch these operations automatically to that state. So for instance,
the current processing requires 10 reads and writes to the database; Trident would
automatically batch them together as one read and one write. This gets you per-
formance and ease of computation where the optimization is handled by the
framework.
• Trident aggregators are other highly efficient and optimized components of the
framework. They don't work by the rule to transfer all the tuples to one machine
Search WWH ::




Custom Search