Performance Optimization - HBase Design Patterns

Database Reference

In-Depth Information

To turn off WAL, set it for each put:

put.setDurability(Durability.SKIP_WAL);

More tips for high-performing HBase reads

So far, we looked at tips to write data into HBase. Now, let's take a look at some

tips to read data faster.

The scan cache

When reading a large number of rows, it is better to set scan caching to a high

number (in the 100 seconds or 1,000 seconds range). Otherwise, each row that is

scanned will result in a trip to HRegionServer . This is especially encouraged for

MapReduce jobs as they will likely consume a lot of rows sequentially.

To set scan caching, use the following code:

Scan scan = new Scan();

scan.setCaching(1000);

Only read the families or columns needed

When fetching a row, by default, HBase returns all the families and all the columns.

If you only care about one family or a few attributes, specifying them will save

needless I/O.

To specify a family, use this:

scan.addFamily( Bytes.toBytes("familiy1"));

To specify columns, use this:

scan.addColumn( Bytes.toBytes("familiy1"),

Bytes.toBytes("col1"))

The block cache

When scanning large rows sequentially (say in MapReduce), it is recommended that

you turn off the block cache. Turning off the cache might be completely counter-

intuitive. However, caches are only effective when we repeatedly access the same

rows. During sequential scanning, there is no caching, and turning on the block

cache will introduce a lot of churning in the cache (new data is constantly brought

into the cache and old data is evicted to make room for the new data).

Search WWH ::

Custom Search

Home