Information Technology Reference
In-Depth Information
FIGUREĀ 10.11 Hello World cloud service-generated web page
Deploying and Managing a Scalable Web Service
with Flume on Amazon EC2
Machine-generated log data is valuable in locating causes of various hardware and soft-
ware failures. The log information can provide feedback for improving system architecture,
reducing system degradation, and improving uptime. For businesses, this translates to cost
savings and customer retention. Businesses have recently started using this log data for
business insights. In the following sections, we present a how-to guide to deploy and use
Flume on a Hadoop cluster ( www.ibm.com/developerworks/library/bd-flumews/ ).
Flume Notes
Flume is a distributed service for efficiently collecting, aggregating, and reliably moving
large amounts of streaming event data from many sources to a centralized data store. Flume
architecture includes agents (nodes) and events (data flows).
A Flume event can be defined as a unit of data flow having a payload (bytes) and an
optional set of string attributes. A Flume agent is a JVM process that hosts the components
through which events flow from an external source to the next hop or end destination.
Flume can be used to collect data on a remote location, and a collector can be config-
ured on a Hadoop Distributed File System (HDFS) cluster. This cluster can be used as end
storage. FigureĀ 10.12 provides a simple illustration of Flume architecture.
Search WWH ::




Custom Search