Databases Reference
In-Depth Information
Unlike Scribe and Flume, Chukwa adds an additional powerful toolkit for monitoring and analysis,
beyond log collection and aggregation. For collection and aggregation, it's quite similar to Flume.
The Chukwa architecture is shown in Figure 17-2.
Chukwa's reliance on the Hadoop infrastructure
is its strength but it also is its weakness. As it's
currently structured, it's meant to be a batch-
oriented tool and not for real-time analysis.
RDBMS
(or HBase)
Read more about Chukwa in the following
presentations and research papers:
MapReduce jobs
“Chukwa: a scalable log collector” —
www.usenix.org/event/lisa10/tech/
slides/rabkin.pdf
“Chukwa: A large-scale monitoring
system” — www.cca08.org/papers/
Paper-13-Ariel-Rabkin.pdf
HDFS
Archival
Storage
Data Sink
PIG
Data Source
(syslog,
app1, app2,
web server, …)
Pig provides a high-level data fl ow defi nition
language and environment for large-scale data
analysis using MapReduce jobs. Pig includes a
language, called Pig Latin, which has a simple
and intuitive syntax that makes it easy to write
parallel programs. The Pig layer manages
effi cient execution of the parallel jobs by
invoking MapReduce jobs under the seams.
Agent
Collector
FIGURE 17-2
The MapReduce framework forces developers to think of every algorithm in terms of map and
reduce functions. The MapReduce method of thinking breaks every operation into very simple
operations, which go through the two steps of map and reduce. The map function emits key/value
pairs of data and the reduce function runs aggregation or manipulation functions on these emitted
key/value pairs. The net result of this exercise is that every join, group, average, or count operation
needs to be defi ned every time in terms of its MapReduce equivalents. This hampers developer
productivity. In terms of the Hadoop infrastructure, it also involves writing a lot of Java code. Pig
provides a higher-level abstraction and provides a set of ready-to-use functions. Therefore, with Pig
you no longer need to write MapReduce jobs for join, group, average, and count from the ground
up. Also, the number of lines of code typically gets reduced from 100s of lines of Java code to 10s of
lines of Pig Latin script.
Not only does Pig reduce the number of lines of code, but the terse and easy syntax makes it
possible for non-programmers to run MapReduce jobs. As Pig evolves it becomes possible for
Search WWH ::




Custom Search