Getting Started with Apache Hadoop - Cloudera Administration

Database Reference

In-Depth Information

Components of Apache Hadoop

Apache Hadoop is composed of two core components. They are:

• HDFS : The HDFS is responsible for the storage of files. It is the storage compon-

ent of Apache Hadoop, which was designed and developed to handle large files ef-

ficiently. It is a distributed filesystem designed to work on a cluster and makes it

easy to store large files by splitting the files into blocks and distributing them

across multiple nodes redundantly. The users of HDFS need not worry about the

underlying networking aspects, as HDFS takes care of it. HDFS is written in Java

and is a filesystem that runs within the user space.

• MapReduce : MapReduce is a programming model that was built from models

found in the field of functional programming and distributed computing. In

MapReduce, the task is broken down to two parts: map and reduce . All data in

MapReduce flows in the form of key and value pairs, <key, value> . Mappers

emit key and value pairs and the reducers receive them, work on them, and pro-

duce the final result. This model was specifically built to query/process the large

volumes of data stored in HDFS.

We will be going through HDFS and MapReduce in depth in the next chapter.

Search WWH ::

Custom Search

Home