Databases Reference
In-Depth Information
MapReduce limitations (Version 1, Hadoop MapReduce)
●
Scalability:
●
Maximum cluster size: 4,000 nodes
●
Maximum concurrent tasks: 40,000
●
Coarse synchronization in JobTracker
●
Single point of failure:
●
NameNode or JobTracker can become the choking point
●
Failure kills all queued and running jobs
●
Jobs need to be resubmitted by users
●
Restart is very tricky due to complex state
●
Hard partition of resources into Map and Reduce slots
MapReduce v2 (YARN)
At the time of writing this topic, MapReduce v2 known as YARN is about six months into a stable
release. There are a number of limitations of v1 that have been addressed in this release.
One issue that has been addressed is the JobTracker is a major component in data processing as
it manages key tasks of resource marshaling and job execution at individual task levels (
Figure 4.10
).
This interface has deficiencies in:
●
Memory consumption
●
Threading model
●
Scalability
Task
Tr acker
Task
Task
Client
Task
Tr acker
Job
Tr acker
Client
Task
Task
Task
Tr acker
MapRreduce Status
Job Submission
Task
Task
FIGURE 4.10
MapReduce classic JobTracker architecture.
Source: Apache Foundation.