Database Reference
In-Depth Information
To turn off WAL, set it for each put:
put.setDurability(Durability.SKIP_WAL);
More tips for high-performing HBase reads
So far, we looked at tips to write data into HBase. Now, let's take a look at some
tips to read data faster.
The scan cache
When reading a large number of rows, it is better to set scan caching to a high
number (in the 100 seconds or 1,000 seconds range). Otherwise, each row that is
scanned will result in a trip to HRegionServer . This is especially encouraged for
MapReduce jobs as they will likely consume a lot of rows sequentially.
To set scan caching, use the following code:
Scan scan = new Scan();
scan.setCaching(1000);
Only read the families or columns needed
When fetching a row, by default, HBase returns all the families and all the columns.
If you only care about one family or a few attributes, specifying them will save
needless I/O.
To specify a family, use this:
scan.addFamily( Bytes.toBytes("familiy1"));
To specify columns, use this:
scan.addColumn( Bytes.toBytes("familiy1"),
Bytes.toBytes("col1"))
The block cache
When scanning large rows sequentially (say in MapReduce), it is recommended that
you turn off the block cache. Turning off the cache might be completely counter-
intuitive. However, caches are only effective when we repeatedly access the same
rows. During sequential scanning, there is no caching, and turning on the block
cache will introduce a lot of churning in the cache (new data is constantly brought
into the cache and old data is evicted to make room for the new data).
 
Search WWH ::




Custom Search