HBase - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

SCANNERS

HBase scanners are like cursors in a traditional database or Java iterators, except — unlike the latter —

they have to be closed after use. Scanners return rows in order. Users obtain a scanner on a Table ob-

ject by calling getScanner() , passing a configured instance of a Scan object as a parameter. In the

Scan instance, you can pass the row at which to start and stop the scan, which columns in a row to re-

turn in the row result, and a filter to run on the server side. The ResultScanner interface, which is

returned when you call getScanner() , is as follows:

public interface ResultScanner extends Closeable , Iterable < Result > {

public Result next () throws IOException ;

public Result [] next ( int nbRows ) throws IOException ;

public void close ();

}

You can ask for the next row's results, or a number of rows. Scanners will, under the covers, fetch

batches of 100 rows at a time, bringing them client-side and returning to the server to fetch the next

batch only after the current batch has been exhausted. The number of rows to fetch and cache in this way

is determined by the hbase.client.scanner.caching configuration option. Alternatively, you

can set how many rows to cache on the Scan instance itself via the setCaching() method.

Higher caching values will enable faster scanning but will eat up more memory in the client. Also, avoid

setting the caching so high that the time spent processing the batch client-side exceeds the scanner

timeout period. If a client fails to check back with the server before the scanner timeout expires, the serv-

er will go ahead and garbage collect resources consumed by the scanner server-side. The default scanner

timeout is 60 seconds, and can be changed by setting

hbase.client.scanner.timeout.period . Clients will see an UnknownScannerExcep-

tion if the scanner timeout has expired.

The simplest way to compile the program is to use the Maven POM that comes with the

book's example code. Then we can use the hbase command followed by the classname

to run the program. Here's a sample run:

% mvn package

% export HBASE_CLASSPATH=hbase-examples.jar

% hbase ExampleClient

Get: keyvalues={row1/data:1/1414932826551/Put/vlen=6/mvcc=0}

Scan: keyvalues={row1/data:1/1414932826551/Put/vlen=6/mvcc=0}

Scan: keyvalues={row2/data:2/1414932826564/Put/vlen=6/mvcc=0}

Scan: keyvalues={row3/data:3/1414932826566/Put/vlen=6/mvcc=0}

Each line of output shows an HBase row, rendered using the toString() method from

Result . The fields are separated by a slash character, and are as follows: the row name,

the column name, the cell timestamp, the cell type, the length of the value's byte array

( vlen ), and an internal HBase field ( mvcc ). We'll see later how to get the value from a

Result object using its getValue() method.

Search WWH ::

Custom Search

Home