Database Reference
In-Depth Information
System . exit ( job . waitForCompletion ( true ) ? 0 : 1 );
}
}
Running a Distributed MapReduce Job
The same program will run, without alteration, on a full dataset. This is the point of
MapReduce: it scales to the size of your data and the size of your hardware. Here's one
data point: on a 10-node EC2 cluster running High-CPU Extra Large instances, the pro-
gram took six minutes to run. [ 21 ]
We'll go through the mechanics of running programs on a cluster in Chapter 6 .
Search WWH ::




Custom Search