HBase - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

HBase

Enter HBase, which has the following characteristics:

No real indexes

Rows are stored sequentially, as are the columns within each row. Therefore, no issues

with index bloat, and insert performance is independent of table size.

Automatic partitioning

As your tables grow, they will automatically be split into regions and distributed across

all available nodes.

Scale linearly and automatically with new nodes

Add a node, point it to the existing cluster, and run the regionserver. Regions will auto-

matically rebalance, and load will spread evenly.

Commodity hardware

Clusters are built on $1,000-$5,000 nodes rather than $50,000 nodes. RDBMSs are I/O

hungry, requiring more costly hardware.

Fault tolerance

Lots of nodes means each is relatively insignificant. No need to worry about individual

node downtime.

Batch processing

MapReduce integration allows fully parallel, distributed jobs against your data with

locality awareness.

If you stay up at night worrying about your database (uptime, scale, or speed), you should

seriously consider making a jump from the RDBMS world to HBase. Use a solution that

was intended to scale rather than a solution based on stripping down and throwing money

at what used to work. With HBase, the software is free, the hardware is cheap, and the dis-

tribution is intrinsic.

Search WWH ::

Custom Search

Home