HBase - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

HBase Versus RDBMS

HBase and other column-oriented databases are often compared to more traditional and

popular relational databases, or RDBMSs. Although they differ dramatically in their imple-

mentations and in what they set out to accomplish, the fact that they are potential solutions

to the same problems means that despite their enormous differences, the comparison is a

fair one to make.

As described previously, HBase is a distributed, column-oriented data storage system. It

picks up where Hadoop left off by providing random reads and writes on top of HDFS. It

has been designed from the ground up with a focus on scale in every direction: tall in num-

bers of rows (billions), wide in numbers of columns (millions), and able to be horizontally

partitioned and replicated across thousands of commodity nodes automatically. The table

schemas mirror the physical storage, creating a system for efficient data structure serializa-

tion, storage, and retrieval. The burden is on the application developer to make use of this

storage and retrieval in the right way.

Strictly speaking, an RDBMS is a database that follows Codd's 12 rules . Typical RDBMSs

are fixed-schema, row-oriented databases with ACID properties and a sophisticated SQL

query engine. The emphasis is on strong consistency, referential integrity, abstraction from

the physical layer, and complex queries through the SQL language. You can easily create

secondary indexes; perform complex inner and outer joins; and count, sum, sort, group, and

page your data across a number of tables, rows, and columns.

For a majority of small- to medium-volume applications, there is no substitute for the ease

of use, flexibility, maturity, and powerful feature set of available open source RDBMS

solutions such as MySQL and PostgreSQL. However, if you need to scale up in terms of

dataset size, read/write concurrency, or both, you'll soon find that the conveniences of an

RDBMS come at an enormous performance penalty and make distribution inherently diffi-

cult. The scaling of an RDBMS usually involves breaking Codd's rules, loosening ACID

restrictions, forgetting conventional DBA wisdom, and, on the way, losing most of the de-

sirable properties that made relational databases so convenient in the first place.

Successful Service

Here is a synopsis of how the typical RDBMS scaling story runs. The following list pre-

sumes a successful growing service:

Initial public launch

Move from local workstation to a shared, remotely hosted MySQL instance with a well-

defined schema.

Search WWH ::

Custom Search

Home