Database Reference
In-Depth Information
e40be0371f43135e36ea67edec6e31e3/
cf1
0 2014-02-28 16:40 /hbase/my_table/
e40be0371f43135e36ea67edec6e31e3/
cf2
As can be seen, four subdirectories have been created under
/hbase/mytable
.
Each subdirectory is named by taking the hash of its respective region name, which
includes the start and end rows. Under each of these directories are the directories
for the column families,
cf1
and
cf2
in the example, and the
.regioninfo
file, which contains several options and attributes for how the regions will be
maintained. The column family directories store keys and values for the
corresponding column qualifiers. The column qualifiers from one column family
should seldom be read with the column qualifiers from another column family. The
reason for the separate column families is to minimize the amount of unnecessary
data that HBase has to sift through within a region to find the requested data.
Requesting data from two column families means that multiple directories have to
be scanned to pull all the desired columns, which defeats the purpose of creating
the column families in the first place. In such cases, the table design may be better
off with just one column family. In practice, the number of column families should
be no more than two or three. Otherwise, performance issues may arise [30].
The following operations add data to the table using the
put
command. From
these three
put
operations,
data1
and
data2
are entered into column qualifiers,
cq1
and
cq2
, respectively, in column family
cf1
. The value
data3
is entered into
column qualifier
cq3
in column family
cf2
. The row is designated by row key
000700
in each operation.
hbase> put 'my_table', '000700', 'cf1:cq1', 'data1'
0 row(s) in 0.0030 seconds
hbase> put 'my_table', '000700', 'cf1:cq2', 'data2'
0 row(s) in 0.0030 seconds
hbase> put 'my_table', '000700', 'cf2:cq3', 'data3'
0 row(s) in 0.0040 seconds