Database Reference
In-Depth Information
public static
void
addInputPath
(
Job job
,
Path path
,
Class
<?
extends
InputFormat
>
inputFormatClass
)
This is useful when you only have one mapper (set using the
Job
's
setMapper-
Class()
method) but multiple input formats.
Database Input (and Output)
DBInputFormat
is an input format for reading data from a relational database, using
JDBC. Because it doesn't have any sharding capabilities, you need to be careful not to
overwhelm the database from which you are reading by running too many mappers. For
this reason, it is best used for loading relatively small datasets, perhaps for joining with
larger datasets from HDFS using
MultipleInputs
. The corresponding output format
is
DBOutputFormat
, which is useful for dumping job outputs (of modest size) into a
database.
For an alternative way of moving data between relational databases and HDFS, consider
using Sqoop, which is described in
Chapter 15
.
HBase's
TableInputFormat
is designed to allow a MapReduce program to operate
on data stored in an HBase table.
TableOutputFormat
is for writing MapReduce out-
puts into an HBase table.