Database Reference
In-Depth Information
public static void addInputPath ( Job job , Path path ,
Class <? extends InputFormat >
inputFormatClass )
This is useful when you only have one mapper (set using the Job 's setMapper-
Class() method) but multiple input formats.
Database Input (and Output)
DBInputFormat is an input format for reading data from a relational database, using
JDBC. Because it doesn't have any sharding capabilities, you need to be careful not to
overwhelm the database from which you are reading by running too many mappers. For
this reason, it is best used for loading relatively small datasets, perhaps for joining with
larger datasets from HDFS using MultipleInputs . The corresponding output format
is DBOutputFormat , which is useful for dumping job outputs (of modest size) into a
database.
For an alternative way of moving data between relational databases and HDFS, consider
using Sqoop, which is described in Chapter 15 .
HBase's TableInputFormat is designed to allow a MapReduce program to operate
on data stored in an HBase table. TableOutputFormat is for writing MapReduce out-
puts into an HBase table.
Search WWH ::




Custom Search