Database Reference
In-Depth Information
Wikipedia Edit Events by Language
A less trivial example is to use the groupBy feature of the Trident
language to count the edits to language-specific Wikipedia sites. Unlike
what was described in the last section, Trident does not expose tick
events. Instead it emits an event after processing each batch. By
default, each batch contains 1,000 events. This makes the previously
described trivial topology somewhat uninteresting because it will
always produce [1000] as its output.
To implement this topology, the JSONToTuple function from Chapter
5 is first used to extract the “channel” elements from the JSON input
stream provided by Samza. The groupBy operator is then used to split
the stream into substreams according to their channels. Finally, the
aggregator is used to compute the count for each language's edit.
It takes a few moments to accumulate roughly 1,000 events, but after
some time something similar to the following events should be sent to
the node.js application:
[ '#en.wikipedia', 598 ]
[ '#de.wikipedia', 43 ]
[ '#sv.wikipedia', 16 ]
[ '#fr.wikipedia', 61 ]
[ '#pt.wikipedia', 41 ]
[ '#es.wikipedia', 100 ]
[ '#zh.wikipedia', 27 ]
[ '#pl.wikipedia', 2 ]
[ '#nl.wikipedia', 9 ]
[ '#it.wikipedia', 28 ]
[ '#ru.wikipedia', 35 ]
[ '#fi.wikipedia', 1 ]
[ '#ja.wikipedia', 42 ]
Search WWH ::




Custom Search