Database Reference
In-Depth Information
public static
void
main
(
String
[]
args
)
throws
Exception
{
int
exitCode
=
ToolRunner
.
run
(
HBaseConfiguration
.
create
(),
new
SimpleRowCounter
(),
args
);
System
.
exit
(
exitCode
);
}
}
The
RowCounterMapper
nested class is a subclass of the HBase
TableMapper
ab-
stract class, a specialization of
org.apache.hadoop.mapreduce.Mapper
that
sets the map input types passed by
TableInputFormat
. Input keys are
Immut-
ableBytesWritable
objects (row keys), and values are
Result
objects (row results
from a scan). Since this job counts rows and does not emit any output from the map, we
just increment
Counters.ROWS
by 1 for every row we see.
In the
run()
method, we create a scan object that is used to configure the job by invok-
ing the
TableMapReduceUtil.initTableMapJob()
utility method, which,
among other things (such as setting the map class to use), sets the input format to
TableInputFormat
.
Notice how we set a filter, an instance of
FirstKeyOnlyFilter
, on the scan. This fil-
ter instructs the server to short-circuit when running server-side, populating the
Result
object in the mapper with only the first cell in each row. Since the mapper ignores the cell
values, this is a useful optimization.
TIP
You can also find the number of rows in a table by typing
count '
tablename
'
in the HBase shell.
It's not distributed, though, so for large tables the MapReduce program is preferable.
REST and Thrift
HBase ships with REST and Thrift interfaces. These are useful when the interacting ap-
plication is written in a language other than Java. In both cases, a Java server hosts an in-
stance of the HBase client brokering REST and Thrift application requests into and out of
the HBase cluster. Consult the
Reference Guide
for information on running the services,
and the client interfaces.