Processing Streaming Data - Real-Time Analytics

Database Reference

In-Depth Information

To use this in a topology, a KafkaSpout is created and then used like any

other spout. This simple test topology reports on the events being produced

on the Wikipedia-raw topic:

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("kafka", new KafkaSpout(config), 10);

builder.setBolt("print", new LoggerBolt())

.shuffleGrouping("kafka");

Config conf = new Config();

LocalCluster cluster = new LocalCluster();

cluster.submitTopology("example", conf,

builder.createTopology());

//Sleep for a minute

Thread. sleep (60000);

Connecting Storm to Flume

There is a fundamental “impedance mismatch” between Storm and Flume.

Storm is a polling consumer, assuming a pull model. Flume's fundamental

sink model is push based. Although some attempts have been made, there is

no accepted mechanism for directly connecting Flume and Storm.

There are solutions to this problem, but all of them involve adding another

queuing system or data motion system into the mix. There are two basic

approaches that are used “in the wild.” The first is to use queuing systems

that are compatible with the Advanced Message Queuing Protocol (AMQP).

For Flume applications, the RabbitMQ queuing system is popular because

there exists a Flume sink developed by Kenshoo ( http://github.com/

kenshoo/flume-rabbitmq-sink ). Storm's AMQPSpout is then used to read

from RabbitMQ.

The other option is to use a Kafka sink for Flume and then integrate Storm

with Kafka as described in the previous section. At this point the only

real reason to use Flume is to integrate with an existing infrastructure.

YoucanfindtheFlume/Kafka sinkat https://github.com/baniuyao/

flume-ng-kafka-sink.

Distributed Remote Procedure Calls

In addition to the development of standard processing topologies that

consumedatafromastream,Stormalsoincludesfacilitiesforimplementing

Search WWH ::

Custom Search

Home