Database Reference
In-Depth Information
Examples and illustrations
One of the other out-of-the-box and popular implementations of Trident is reach topology,
which is a pure DRPC topology that finds the reach of a URL on demand. Let's first under-
stand some of the jargon before we delve deeper.
Reach is basically a sum total of the count of Twitter users exposed to a URL.
Reach computation is a multistep process that can be attained by the following examples:
• Get all the users who have ever tweeted a URL
• Fetch the follower tree of each of these users
• Assemble the huge follower sets fetched previously
• Count the set
Well, looking at the skeletal algorithm entailed previously, you can make out that it is bey-
ond the capability of a single machine and we'd need a distributed compute engine to
achieve it. It's an ideal candidate of the Storm Trident framework, as you have the capabil-
ity to execute highly parallel computations at each step across the cluster.
• Our Trident reach topology would be sucking data from two large data banks
• Bank A is the URL to the originator bank, wherein all the URLs would be stored
along with the name of the user who had tweeted them
• Bank B is the user follower bank; this data bank will have a user to follow the
mapping for all Twitter users
The topology would be defined as follows:
TridentState urlToTweeterState =
topology.newStaticState(getUrlToTweetersState());
TridentState tweetersToFollowerState =
topology.newStaticState(getTweeterToFollowersState());
topology.newDRPCStream("reach")
.stateQuery(urlToTweeterState, new Fields("args"),
new MapGet(), new Fields("tweeters"))
.each(new Fields("tweeters"), new ExpandList(), new
Fields("tweeter"))
.shuffle()
.stateQuery(tweetersToFollowerState, new
Search WWH ::




Custom Search