Processing Streaming Data - Real-Time Analytics

Database Reference

In-Depth Information

The Classic “Word Count” Example

The Word Count example is the “Hello World” of Big Data processing,

and no discussion of real-time streaming processing would be complete

without it.

This example uses a stream of Wikipedia edits coming from Kafka into

Trident to implement a word counting example. This data source is

actually provided by the Samza package discussed in the next part of

this chapter, and it provides a handy source of data for testing. This is

configured using the TransactionalTridentKafkaSpout class:

TridentTopology topology = new TridentTopology();

TridentKafkaConfig config = new

TridentKafkaConfig(

new ZkHosts("localhost"),

" wikipedia-raw",

"storm"

);

config.scheme = new SchemeAsMultiScheme( new

StringScheme());

topology.newStream("kafka",

new

TransactionalTridentKafkaSpout(config)).shuffle()

This spout emits a string of JSON that must be parsed and split into

words for further processing. For simplicity, this function

implementation only looks at the title element of the raw output:

.each( new Fields("str"), new Function() {

private static final long serialVersionUID = 1L;

transient JSONParser parser;

public void prepare(Map conf,

TridentOperationContext context) {

parser = new JSONParser();

}

public void cleanup() { }

Search WWH ::

Custom Search

Home