Database Reference
In-Depth Information
Summary
This chapter examined the MapReduce paradigm and its application in Big Data
analytics. Specifically, it examined the implementation of MapReduce in Apache
Hadoop. The power of MapReduce is realized with the use of the Hadoop
Distributed File System (HDFS) to store data in a distributed system. The ability
to run a MapReduce job on the data stored across a cluster of machines enables
the parallel processing of petabytes or exabytes of data. Furthermore, by adding
additional machines to the cluster, Hadoop can scale as the data volumes grow.
This chapter examined several Apache projects within the Hadoop ecosystem. By
providing a higher-level programming language, Apache Pig and Hive simplify
the code development by masking the underlying MapReduce logic to perform
common data processing tasks such as filtering, joining datasets, and restructuring
data. Once the data is properly conditioned within the Hadoop cluster, Apache
Mahout can be used to conduct data analyses such as clustering, classification, and
collaborative filtering.
The strength of MapReduce in Apache Hadoop and the so far mentioned projects
in the Hadoop ecosystem are in batch processing environments. When real-time
processing, including read and writes, are required, Apache HBase is an option.
HBase uses HDFS to store large volumes of data across the cluster, but it also
maintains recent changes within memory to ensure the real-time availability of
the latest data. Whereas MapReduce in Hadoop, Pig, and Hive are more
general-purpose tools that can address a wide range of tasks, HBase is a somewhat
more purpose-specific tool. Data will be retrieved from and written to the HBase in
a well-understood manner.
HBase is one example of the NoSQL (Not only Structured Query Language) data
stores that are being developed to address specific Big Data use cases. Maintaining
and traversing social network graphs are examples of relational databases not being
the best choice as a data store. However, relational databases and SQL remain
powerful and common tools and will be examined in more detail in Chapter 11.
Search WWH ::




Custom Search