Database Reference
In-Depth Information
KafkaStream< byte [], byte []> stream =
consumerMap.get(topic).get(0);
The KafkaStream object defines an iterator with a class of
ConsumerIterator , which is used to retrieve data from the stream.
Calling the next method on the Iterator returns a Message object that
can be used to extract the Key and the Message:
ConsumerIterator< byte [], byte []> iter =
stream.iterator();
while (iter.hasNext()) {
String message = new String(iter.next().message());
System. out .println(message);
}
Apache Flume: Distributed Log Collection
Kafka was designed and built within a specific production environment. In
this case, its designers essentially had complete control over the available
software stack. In addition, they were replacing an existing component with
similar producer/consumer semantics (though different delivery
semantics), making Kafka relatively easy to integrate from an engineering
perspective.
In many production environments, developers are not so lucky. There often
are a number of pre-existing legacy systems and data collection or logging
paths. In addition, for a variety of reasons, it is not possible to modify these
systems' logging behavior. For example, the software could be produced by
an outside vendor with their own goals and development roadmap.
To integrate into these environments, there is the Apache Flume project.
Rather than providing a single opinionated mechanism for log collection,
like Kafka, Flume is a framework for integrating the data ingress and egress
of a variety of data services. Originally, this was primarily the collection
of logs from things like web servers with transport to HDFS in a Hadoop
environment, but Flume has since expanded to cover a much larger set of
use-cases, sometimes blurring the line between the data motion discussed
in this chapter and the processing systems discussed in Chapter 5.
Search WWH ::




Custom Search