MapReduce Features - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

which is then used to construct partitions. Luckily, we don't have to write the code to do

this ourselves, as Hadoop comes with a selection of samplers.

The InputSampler class defines a nested Sampler interface whose implementations

return a sample of keys given an InputFormat and Job :

public interface Sampler < K , V > {

K [] getSample ( InputFormat < K , V > inf , Job job )

throws IOException , InterruptedException ;

}

Figure 9-1. Temperature distribution for the weather dataset

This interface usually is not called directly by clients. Instead, the writeParti-

tionFile() static method on InputSampler is used, which creates a sequence file

to store the keys that define the partitions:

Search WWH ::

Custom Search

Home