Database Reference
In-Depth Information
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.
If, for example, N is 2, then each split contains two lines. One mapper will receive the
first two key-value pairs:
(0, On the top of the Crumpetty Tree)
(33, The Quangle Wangle sat,)
And another mapper will receive the second two key-value pairs:
(57, But his face you could not see,)
(89, On account of his Beaver Hat.)
The keys and values are the same as those that TextInputFormat produces. The dif-
ference is in the way the splits are constructed.
Usually, having a map task for a small number of lines of input is inefficient (due to the
overhead in task setup), but there are applications that take a small amount of input data
and run an extensive (i.e., CPU-intensive) computation for it, then emit their output. Sim-
ulations are a good example. By creating an input file that specifies input parameters, one
per line, you can perform a parameter sweep : run a set of simulations in parallel to find
how a model varies as the parameter changes.
WARNING
If you have long-running simulations, you may fall afoul of task timeouts. When a task doesn't report
progress for more than 10 minutes, the application master assumes it has failed and aborts the process
(see Task Failure ) .
The best way to guard against this is to report progress periodically, by writing a status message or incre-
menting a counter, for example. See What Constitutes Progress in MapReduce? .
Another example is using Hadoop to bootstrap data loading from multiple data sources,
such as databases. You create a “seed” input file that lists the data sources, one per line.
Then each mapper is allocated a single data source, and it loads the data from that source
into HDFS. The job doesn't need the reduce phase, so the number of reducers should be
set to zero (by calling setNumReduceTasks() on Job ). Furthermore, MapReduce
jobs can be run to process the data loaded into HDFS. See Appendix C for an example.
Search WWH ::




Custom Search