Database Reference
In-Depth Information
should be deleted for the corresponding row and column family:column
qualifier.
Once an HBase environment is installed, the user can enter the HBase shell
environment by entering hbase shell at the command prompt. An HBase table,
my_table , can then be created as follows:
$ hbase shell
hbase> create 'my_table', 'cf1', 'cf2',
{SPLITS =>['250000','500000','750000']}
Two column families, cf1 and cf2 , are defined in the table. The SPLITS option
specifies how the table will be divided based on the row portion of the key. In
this example, the table is split into four parts, called regions . Rows less than
250000 are added to the first region; rows from 250000 to less than 500000
are added to the second region, and likewise for the remaining splits. These splits
provide the primary mechanism for achieving the real-time read and write access.
In this example, my_table is split into four regions, each on its own worker
node in the Hadoop cluster. Thus, as the table size increases or the user load
increases, additional worker nodes and region splits can be added to scale the
cluster appropriately. The reads and writes are based on the contents of the row.
HBase can quickly determine the appropriate region to direct a read or write
command. More about regions and their implementation will be discussed later.
Only column families, not column qualifiers, need to be defined during HBase
table creation. New column qualifiers can be defined whenever data is written
to the HBase table. Unlike most relational databases, in which a database
administrator needs to add a column and define the data type, columns can be
added to an HBase table as the need arises. Such flexibility is one of the strengths of
HBase and is certainly desirable when dealing with unstructured data. Over time,
the unstructured data will likely change. Thus, the new content with new column
qualifiers must be extracted and added to the HBase table.
Column families help to define how the table will be physically stored. An HBase
table is split into regions, but each region is split into column families that are
stored separately in HDFS. From the Linux command prompt, running hadoop
fs -ls -R /hbase shows how the HBase table, my_table , is stored in HBase.
$ hadoop fs -ls -R /hbase
Search WWH ::




Custom Search