Database Reference
In-Depth Information
Setting memory buffers
We can control when the puts are flushed by setting the client write buffer
option. Once the data in the memory exceeds this setting, it is flushed to disk. The
default setting is 2 M . Its purpose is to limit how much data is stored in the buffer
before writing it to disk.
There are two ways of setting this:
• In hbase-site.xml (this setting will be cluster-wide):
<property>
<name>hbase.client.write.buffer</name>
<value>8388608</value> <!-- 8 M -->
</property>
• In the application (only applies for that application):
htable.setWriteBufferSize(1024*1024*10); // 10
Keep in mind that a bigger buffer takes more memory on both the client side and the
server side. As a practical guideline, estimate how much memory you can dedicate
to the client and put the rest of the load on the cluster.
Turning off autoflush
If autoflush is enabled, each htable.put() object incurs a round trip RPC call to
HRegionServer . Turning autoflush off can reduce the number of round trips and
decrease latency. To turn it off, use this code:
htable.setAutoFlush(false);
The risk of turning off autoflush is if the client crashes before the data is sent to
HBase, it will result in a data loss. Still, when will you want to do it? The answer is:
when the danger of data loss is not important and speed is paramount. Also, see the
batch write recommendations we saw previously.
Turning off WAL
Before we discuss this, we need to emphasize that the write-ahead log ( WAL ) is
there to prevent data loss in the case of server crashes. By turning it off, we are
bypassing this protection. Be very careful when choosing this. Bulk loading is one
of the cases where turning off WAL might make sense.
 
Search WWH ::




Custom Search