Advanced Analytics—Technology and Tools: MapReduce and Hadoop - Data Science and Big Data Analytics

Database Reference

In-Depth Information

should be deleted for the corresponding row and column family:column

qualifier.

Once an HBase environment is installed, the user can enter the HBase shell

environment by entering hbase shell at the command prompt. An HBase table,

my_table , can then be created as follows:

$ hbase shell

hbase> create 'my_table', 'cf1', 'cf2',

{SPLITS =>['250000','500000','750000']}

Two column families, cf1 and cf2 , are defined in the table. The SPLITS option

specifies how the table will be divided based on the row portion of the key. In

this example, the table is split into four parts, called regions . Rows less than

250000 are added to the first region; rows from 250000 to less than 500000

are added to the second region, and likewise for the remaining splits. These splits

provide the primary mechanism for achieving the real-time read and write access.

In this example, my_table is split into four regions, each on its own worker

node in the Hadoop cluster. Thus, as the table size increases or the user load

increases, additional worker nodes and region splits can be added to scale the

cluster appropriately. The reads and writes are based on the contents of the row.

HBase can quickly determine the appropriate region to direct a read or write

command. More about regions and their implementation will be discussed later.

Only column families, not column qualifiers, need to be defined during HBase

table creation. New column qualifiers can be defined whenever data is written

to the HBase table. Unlike most relational databases, in which a database

administrator needs to add a column and define the data type, columns can be

added to an HBase table as the need arises. Such flexibility is one of the strengths of

HBase and is certainly desirable when dealing with unstructured data. Over time,

the unstructured data will likely change. Thus, the new content with new column

qualifiers must be extracted and added to the HBase table.

Column families help to define how the table will be physically stored. An HBase

table is split into regions, but each region is split into column families that are

stored separately in HDFS. From the Linux command prompt, running hadoop

fs -ls -R /hbase shows how the HBase table, my_table , is stored in HBase.

$ hadoop fs -ls -R /hbase

Search WWH ::

Custom Search

Home