Advanced Analytics—Technology and Tools: MapReduce and Hadoop - Data Science and Big Data Analytics

Database Reference

In-Depth Information

Figure 10.6 Choosing a shipping address at checkout

Of course, in addition to a customer's shipping address, other customer

information, such as billing address, preferences, billing credits/debits, and

customer benefits (for example, free shipping) must be stored. For this type of

application, real-time access is required. Thus, the use of the batch processing of

Pig, Hive, or Hadoop's MapReduce is not a reasonable implementation approach.

The following discussion examines how HBase stores the data and provides

real-time read and write access.

As mentioned, HBase is built on top of HDFS. HBase uses a key/value structure

to store the contents of an HBase table. Each value is the data to be stored at the

intersection of the row, column, and version. Each key consists of the following

elements [29]:

• Row length

• Row (sometimes called the row key)

• Column family length

• Column family

• Column qualifier

• Version

• Key type

The row is used as the primary attribute to access the contents of an HBase

table. The row is the basis for how the data is distributed across the cluster and

allows a query of an HBase table to quickly retrieve the desired elements. Thus, the

structure or layout of the row has to be specifically designed based on how the data

will be accessed. In this respect, an HBase table is purpose built and is not intended

Search WWH ::

Custom Search

Home