Database Reference
In-Depth Information
Oozie: Allows you to create a workflow for MapReduce jobs.
HBase: Hadoop database, a NoSQL database.
Mahout: A machine-learning library containing algorithms for clustering and classification.
Ambari: A project for monitoring cluster health statistics and instrumentation.
Figure 1-3 gives you an architectural view of the Apache Hadoop ecosystem. We will explore some of the
components in the subsequent chapters of this topic, but for a complete reference, visit the Apache web site at
http://hadoop.apache.org/ .
Figure 1-3. The Hadoop ecosystem
As you can see, deploying a Hadoop solution requires setup and management of a complex ecosystem of
frameworks (often referred to as a zoo ) across clusters of computers. This might be the only drawback of the Apache
Hadoop framework—the complexity and efforts involved in creating an efficient cluster configuration and the ongoing
administration required. With storage being a commodity, people are looking for easy “off the shelf ” offerings for
Hadoop solutions. This has led to companies like Cloudera, Green Plum and others offering their own distribution of
Hadoop solutions as an out-of-the-box package. The objective is to make Hadoop solutions easily configurable as well
as to make it available on diverse platforms. This has been a grand success in this era of predictive analysis through
Twitter, pervasive use of social media, and the popularity of the self-service BI concept. The future of IT is integration;
it could be integration between closed and open source projects, integration between unstructured and structured
data, or some other form of integration. With the luxury of being able to store any type of data inexpensively, the world
is looking forward to entire new dimensions of data processing and analytics.
Search WWH ::




Custom Search