Integration with Hadoop - Mastering Apache Cassandra

Database Reference

In-Depth Information

move punctuation, split the words by a white space, iterate in this split array of words, for-

ward key as an individual word, and set the value as one. These make the intermediate

key-value pair, as indicated in the following figure:

Hadoop MapReduce framework in action (simplified)

These results are sorted by the key and forwarded to the Reducer interface that you

provided. Incoming tuples have the same key, and reducers can use this property in their

logic. Understanding this sentence is important to a beginner. What it means is that you

can just iterate in the incoming iterator and do things such as group or count—basically

reduce or fold the map by a key.

The reduced values are then stored in a place of your choice, which can be HDFS,

RDBMS, Cassandra, or one of the many other storage options.

There are two main processes that you should know about in context of Hadoop MapRe-

duce, which we will talk about in a bit.

JobTracker

Similar to NameNode, JobTracker is a master process that governs the execution of work-

er threads such as TaskTracker. Like any master-slave architecture, JobTracker is a single

point of failure. Therefore, it is advisable to have robust hardware and redundancy built

into the machine that has JobTracker running.

JobTracker's responsibility includes estimating the number of Mapper tasks from the input

split, for example, file splits from HDFS via InputFormat . It uses already configured

values as numbers of Reducer tasks. A client application can use JobClient to submit jobs

to JobTracker and inquire the status of a job.

Search WWH ::

Custom Search

Home