Database Reference
In-Depth Information
closest to the data. After the decision has been made, the JobTracker can
submit the workload to the targeted TaskTrackers.
TaskTrackers are monitored by the JobTracker. This is a bottom-up
monitoring process. Each TaskTracker must “report in” via a heartbeat
signal. If it fails to do so for any reason, the JobTracker assumes it has failed
and reassigns the tasks accordingly. Similarly, if an error occurs during the
processing of an assigned task, the TaskTracker is responsible for calling
that in to the JobTracker. The decision on what to do next then lies with the
JobTracker.
The JobTracker keeps a record of the tasks as they complete. It maintains
the status of the job, and a client application can poll it to get the latest state
of the job.
NOTE
The JobTracker is a single point of failure for the MapReduce engine. If
it goes down, all running jobs are halted, and new jobs cannot be
scheduled.
Important Apache Projects for Hadoop
Now that we have a conceptual grasp of the core projects for Hadoop (the
brain and heart if you will), we can start to flesh out our understanding of
the broader ecosystem. There are a number of projects that fall under the
Hadoop umbrella. Some will succeed, while others will wither and die. That
is the very nature of open source software. The good ideas get developed,
evolve, and become great—at least, that's the theory.
Some of the projects we are about to discuss are driving lots of
innovation—especially for Hadoop 2.0. Hive is the most notable project in
thisregard.AlmostalltheworkaroundtheHortonworksStingerinitiativeis
to empower SQL in Hadoop. Many of these changes will be driven through
the Hive project. Therefore, it is important to know what Hive is and why it
is getting so much attention.
Search WWH ::




Custom Search