Database Reference
In-Depth Information
Praxis
In this section, we discuss some of the common issues users run into when running an
HBase cluster under load.
HDFS
HBase's use of HDFS is very different from how it's used by MapReduce. In MapReduce,
generally, HDFS files are opened with their content streamed through a map task and then
closed. In HBase, datafiles are opened on cluster startup and kept open so that we avoid
paying the costs associated with opening files on each access. Because of this, HBase tends
to see issues not normally encountered by MapReduce clients:
Running out of file descriptors
Because we keep files open, on a loaded cluster it doesn't take long before we run into
system- and Hadoop-imposed limits. For instance, say we have a cluster that has three
nodes, each running an instance of a datanode and a regionserver, and we're running an
upload into a table that is currently at 100 regions and 10 column families. Allow that
each column family has on average two flush files. Doing the math, we can have 100 ×
10 × 2, or 2,000, files open at any one time. Add to this total other miscellaneous
descriptors consumed by outstanding scanners and Java libraries. Each open file con-
sumes at least one descriptor over on the remote datanode.
The default limit on the number of file descriptors per process is 1,024. When we ex-
ceed the filesystem ulimit , we'll see the complaint about “Too many open files” in logs,
but often we'll first see indeterminate behavior in HBase. The fix requires increasing the
file descriptor ulimit count; 10,240 is a common setting. Consult the HBase Reference
Guide for how to increase the ulimit on your cluster.
Running out of datanode threads
Similarly, the Hadoop datanode has an upper bound on the number of threads it can run
at any one time. Hadoop 1 had a low default of 256 for this setting
( dfs.datanode.max.xcievers ), which would cause HBase to behave erratically.
Hadoop 2 increased the default to 4,096, so you are much less likely to see a problem
for recent versions of HBase (which only run on Hadoop 2 and later). You can change
the setting by configuring dfs.datanode.max.transfer.threads (the new
name for this property) in hdfs-site.xml .
Search WWH ::




Custom Search