Tools and Utilities - Professional NoSQL - page 317

Databases Reference

In-Depth Information

Unlike Scribe and Flume, Chukwa adds an additional powerful toolkit for monitoring and analysis,

beyond log collection and aggregation. For collection and aggregation, it's quite similar to Flume.

The Chukwa architecture is shown in Figure 17-2.

Chukwa's reliance on the Hadoop infrastructure

is its strength but it also is its weakness. As it's

currently structured, it's meant to be a batch-

oriented tool and not for real-time analysis.

RDBMS

(or HBase)

Read more about Chukwa in the following

presentations and research papers:

MapReduce jobs

“Chukwa: a scalable log collector” —

www.usenix.org/event/lisa10/tech/

slides/rabkin.pdf

“Chukwa: A large-scale monitoring

system” — www.cca08.org/papers/

Paper-13-Ariel-Rabkin.pdf

➤

HDFS

Archival

Storage

Data Sink

➤

PIG

Data Source

(syslog,

app1, app2,

web server, …)

Pig provides a high-level data fl ow defi nition

language and environment for large-scale data

analysis using MapReduce jobs. Pig includes a

language, called Pig Latin, which has a simple

and intuitive syntax that makes it easy to write

parallel programs. The Pig layer manages

effi cient execution of the parallel jobs by

invoking MapReduce jobs under the seams.

Agent

Collector

FIGURE 17-2

The MapReduce framework forces developers to think of every algorithm in terms of map and

reduce functions. The MapReduce method of thinking breaks every operation into very simple

operations, which go through the two steps of map and reduce. The map function emits key/value

pairs of data and the reduce function runs aggregation or manipulation functions on these emitted

key/value pairs. The net result of this exercise is that every join, group, average, or count operation

needs to be defi ned every time in terms of its MapReduce equivalents. This hampers developer

productivity. In terms of the Hadoop infrastructure, it also involves writing a lot of Java code. Pig

provides a higher-level abstraction and provides a set of ready-to-use functions. Therefore, with Pig

you no longer need to write MapReduce jobs for join, group, average, and count from the ground

up. Also, the number of lines of code typically gets reduced from 100s of lines of Java code to 10s of

lines of Pig Latin script.

Not only does Pig reduce the number of lines of code, but the terse and easy syntax makes it

possible for non-programmers to run MapReduce jobs. As Pig evolves it becomes possible for

Next Page

Professional NoSQL

Search WWH ::

Custom Search

Home