Databases Reference
In-Depth Information
How Hadoop Works
A client program accesses unstructured and semi-structured data from sources
including log files, social media feeds, and internal data stores. It breaks the data up
into parts, which are then loaded into a file system made up of multiple nodes running
on commodity hardware. The default file store in Hadoop is the Hadoop Distributed
File System, or HDFS. File systems such as HDFS are adept at storing large volumes of
unstructured and semi-structured data, as they do not require data to be organized into
relational rows and columns.
Each part is replicated multiple times and loaded into the file system so that if a node
fails, another node has a copy of the data contained on the failed node. A Name Node
acts as facilitator, communicating back to the client information such as which nodes are
available, where in the cluster certain data resides, and which nodes have failed.
Once the data is loaded into the cluster, it is ready to be analyzed via the map-reduce
framework. The client program submits a map job, usually a query written in Java, to one
of the nodes in the cluster known as the Job Tracker. The Job Tracker refers to the Name
Node to determine which data it needs to access to complete the job and where in the
cluster that data is located. Once determined, the Job Tracker submits the query to the
relevant nodes.
The design philosophy is based on the concept that rather than bringing all
the data back into a central location for processing, processing occurs at each node
simultaneously, or in parallel. This is an essential characteristic of Hadoop.
Note
When each node has finished the processing task, it stores the results. The client
program then initiates a reduce job through the Job Tracker in which results of the map
phase stored locally on individual nodes are aggregated to determine the answer to
the original query, then loaded on to another node in the cluster. The client accesses
these results, which can then be loaded into one of number of analytic environments for
analysis. The map-reduce job has now been completed (Figure 5-6 ).
 
 
Search WWH ::




Custom Search