Database Reference
In-Depth Information
SCANNERS
HBase scanners are like cursors in a traditional database or Java iterators, except — unlike the latter —
they have to be closed after use. Scanners return rows in order. Users obtain a scanner on a
Table
ob-
ject by calling
getScanner()
, passing a configured instance of a
Scan
object as a parameter. In the
Scan
instance, you can pass the row at which to start and stop the scan, which columns in a row to re-
turn in the row result, and a filter to run on the server side. The
ResultScanner
interface, which is
returned when you call
getScanner()
, is as follows:
public interface
ResultScanner
extends
Closeable
,
Iterable
<
Result
> {
public
Result
next
()
throws
IOException
;
public
Result
[]
next
(
int
nbRows
)
throws
IOException
;
public
void
close
();
}
You can ask for the next row's results, or a number of rows. Scanners will, under the covers, fetch
batches of 100 rows at a time, bringing them client-side and returning to the server to fetch the next
batch only after the current batch has been exhausted. The number of rows to fetch and cache in this way
is determined by the
hbase.client.scanner.caching
configuration option. Alternatively, you
can set how many rows to cache on the
Scan
instance itself via the
setCaching()
method.
Higher caching values will enable faster scanning but will eat up more memory in the client. Also, avoid
setting the caching so high that the time spent processing the batch client-side exceeds the scanner
timeout period. If a client fails to check back with the server before the scanner timeout expires, the serv-
er will go ahead and garbage collect resources consumed by the scanner server-side. The default scanner
timeout is 60 seconds, and can be changed by setting
hbase.client.scanner.timeout.period
. Clients will see an
UnknownScannerExcep-
tion
if the scanner timeout has expired.
The simplest way to compile the program is to use the Maven POM that comes with the
book's example code. Then we can use the
hbase
command followed by the classname
to run the program. Here's a sample run:
%
mvn package
%
export HBASE_CLASSPATH=hbase-examples.jar
%
hbase ExampleClient
Get: keyvalues={row1/data:1/1414932826551/Put/vlen=6/mvcc=0}
Scan: keyvalues={row1/data:1/1414932826551/Put/vlen=6/mvcc=0}
Scan: keyvalues={row2/data:2/1414932826564/Put/vlen=6/mvcc=0}
Scan: keyvalues={row3/data:3/1414932826566/Put/vlen=6/mvcc=0}
Each line of output shows an HBase row, rendered using the
toString()
method from
Result
. The fields are separated by a slash character, and are as follows: the row name,
the column name, the cell timestamp, the cell type, the length of the value's byte array
(
vlen
), and an internal HBase field (
mvcc
). We'll see later how to get the value from a
Result
object using its
getValue()
method.