Database Reference
In-Depth Information
SQL query
MapReduce job
SMS planner
MapReduce
job
Hadoop core
Master node
MapReduce
framework
HDFS
NameNode
JobTracker
InputFormat implementations
Database connector
Task with
InputFormat
Node n
Node 1
Node 2
TaskTracker
TaskTracker
TaskTracker
DataNode
DataNode
DataNode
Database
Database
Database
FIGURE 2.15 The architecture of HadoopDB. (From A. Abouzeid et al., PVLDB , 2(1),
922 -933, 20 09.)
from Hadoop. In parallel, it tries to achieve the performance of parallel databases by
doing most of the query processing inside the database engine. Figure 2.15 illustrates
the architecture of HadoopDB, which consists of two layers: (1) a data storage layer
or the Hadoop Distributed File System* (HDFS) and (2) a data-processing layer
or the MapReduce framework. In this architecture, HDFS is a block-structured file
system managed by a central NameNode . Individual files are broken into blocks of a
fixed size and distributed across multiple DataNodes in the cluster. The NameNode
maintains metadata about the size and location of blocks and their replicas. The
MapReduce Framework follows a simple master-slave architecture. The master is a
single JobTracker and the slaves or worker nodes are TaskTrackers . The JobTracker
handles the runtime scheduling of MapReduce jobs and maintains information on
each TaskTracker's load and available resources. The Database Connector is the
interface between independent database systems residing on nodes in the cluster and
TaskTrackers. The Connector connects to the database, executes the SQL query, and
* http://hadoop.apache.org/hdfs/.
Search WWH ::




Custom Search