Databases Reference
In-Depth Information
MapReduce limitations (Version 1, Hadoop MapReduce)
Scalability:
Maximum cluster size: 4,000 nodes
Maximum concurrent tasks: 40,000
Coarse synchronization in JobTracker
Single point of failure:
NameNode or JobTracker can become the choking point
Failure kills all queued and running jobs
Jobs need to be resubmitted by users
Restart is very tricky due to complex state
Hard partition of resources into Map and Reduce slots
MapReduce v2 (YARN)
At the time of writing this topic, MapReduce v2 known as YARN is about six months into a stable
release. There are a number of limitations of v1 that have been addressed in this release.
One issue that has been addressed is the JobTracker is a major component in data processing as
it manages key tasks of resource marshaling and job execution at individual task levels ( Figure 4.10 ).
This interface has deficiencies in:
Memory consumption
Threading model
Scalability
Task
Tr acker
Task
Task
Client
Task
Tr acker
Job
Tr acker
Client
Task
Task
Task
Tr acker
MapRreduce Status
Job Submission
Task
Task
FIGURE 4.10
MapReduce classic JobTracker architecture.
Source: Apache Foundation.
 
Search WWH ::




Custom Search