Testing and Deployment: Quality Is King - Deploying and Managing a Cloud Infrastructure

Information Technology Reference

In-Depth Information

FIGURE 10.11 Hello World cloud service-generated web page

Deploying and Managing a Scalable Web Service

with Flume on Amazon EC2

Machine-generated log data is valuable in locating causes of various hardware and soft-

ware failures. The log information can provide feedback for improving system architecture,

reducing system degradation, and improving uptime. For businesses, this translates to cost

savings and customer retention. Businesses have recently started using this log data for

business insights. In the following sections, we present a how-to guide to deploy and use

Flume on a Hadoop cluster ( www.ibm.com/developerworks/library/bd-flumews/ ).

Flume Notes

Flume is a distributed service for efficiently collecting, aggregating, and reliably moving

large amounts of streaming event data from many sources to a centralized data store. Flume

architecture includes agents (nodes) and events (data flows).

A Flume event can be defined as a unit of data flow having a payload (bytes) and an

optional set of string attributes. A Flume agent is a JVM process that hosts the components

through which events flow from an external source to the next hop or end destination.

Flume can be used to collect data on a remote location, and a collector can be config-

ured on a Hadoop Distributed File System (HDFS) cluster. This cluster can be used as end

storage. Figure 10.12 provides a simple illustration of Flume architecture.

Search WWH ::

Custom Search

Home