Database Reference
In-Depth Information
Fields("tweeter"), new MapGet(), new Fields("followers"))
.parallelismHint(200)
.each(new Fields("followers"), new ExpandList(),
new Fields("follower"))
.groupBy(new Fields("follower"))
.aggregate(new One(), new Fields("one"))
.parallelismHint(20)
.aggregate(new Count(), new Fields("reach"));
In the preceding topology, we perform the following steps:
1. Create a TridentState object for both data banks (URL to the originator
Bank A and users to follow Bank B).
2. The newStaticState method is used for the instantiation of state objects for
data banks; we have the capability to run the DRPC queries over the source states
created earlier.
3. In execution, when the reach of a URL is to be computed, we perform a query us-
ing the Trident state for data bank A to fetch the list of all the users who have ever
tweeted with this URL.
4. The ExpandList function creates and emits one tuple for each of the tweeters
of the URL in query.
5. Next, we fetch the follower of each tweeter fetched previously. This step needs
the highest degree of parallelism, thus we use shuffle grouping here for even load
distribution across all instances of the bolt. In our reach topology, this is the most
intense compute step.
6. Once we have the list of followers of the tweeter of the URL, we execute an oper-
ation analog to filter unique followers only.
7. We arrive at unique followers by grouping them together and then using the one
aggregator. The latter simply emits 1 for each group and in the next step all these
are counted together to arrive at the reach.
8. Then we count the followers (unique) thus arriving at the reach of the URL.
Search WWH ::




Custom Search