Database Reference
In-Depth Information
The Classic “Word Count” Example
The Word Count example is the “Hello World” of Big Data processing,
and no discussion of real-time streaming processing would be complete
without it.
This example uses a stream of Wikipedia edits coming from Kafka into
Trident to implement a word counting example. This data source is
actually provided by the Samza package discussed in the next part of
this chapter, and it provides a handy source of data for testing. This is
configured using the TransactionalTridentKafkaSpout class:
TridentTopology topology = new TridentTopology();
TridentKafkaConfig config = new
TridentKafkaConfig(
new ZkHosts("localhost"),
" wikipedia-raw",
"storm"
);
config.scheme = new SchemeAsMultiScheme( new
StringScheme());
topology.newStream("kafka",
new
TransactionalTridentKafkaSpout(config)).shuffle()
This spout emits a string of JSON that must be parsed and split into
words for further processing. For simplicity, this function
implementation only looks at the title element of the raw output:
.each( new Fields("str"), new Function() {
private static final long serialVersionUID = 1L;
transient JSONParser parser;
public void prepare(Map conf,
TridentOperationContext context) {
parser = new JSONParser();
}
public void cleanup() { }
Search WWH ::




Custom Search