Database Reference
In-Depth Information
The Classic “Word Count” Example
The Word Count example is the “Hello World” of Big Data processing,
and no discussion of real-time streaming processing would be complete
without it.
This example uses a stream of Wikipedia edits coming from Kafka into
Trident to implement a word counting example. This data source is
actually provided by the Samza package discussed in the next part of
this chapter, and it provides a handy source of data for testing. This is
configured using the
TransactionalTridentKafkaSpout
class:
TridentTopology topology =
new
TridentTopology();
TridentKafkaConfig config =
new
TridentKafkaConfig(
new
ZkHosts("localhost"),
"
wikipedia-raw",
"storm"
);
config.scheme =
new
SchemeAsMultiScheme(
new
StringScheme());
topology.newStream("kafka",
new
TransactionalTridentKafkaSpout(config)).shuffle()
This spout emits a string of JSON that must be parsed and split into
words for further processing. For simplicity, this function
implementation only looks at the title element of the raw output:
.each(
new
Fields("str"),
new
Function() {
private static final long
serialVersionUID
= 1L;
transient
JSONParser parser;
public void
prepare(Map conf,
TridentOperationContext context) {
parser =
new
JSONParser();
}
public void
cleanup() { }