Introducing HBase - HBase Essentials - page 4

Database Reference

In-Depth Information

• Meetup ( www.meetup.com ): Meetup uses HBase to power a site-wide,

real-time activity feed system for all of its members and groups. In its

architecture, group activity is written directly to HBase and indexed per

member, with the member's custom feed served directly from HBase for

incoming requests.

• Twitter ( www.twitter.com ): Twitter uses HBase to provide a distributed,

read/write backup of all the transactional tables in Twitter's production

backend. Later, this backup is used to run MapReduce jobs over the data.

Additionally, its operations team uses HBase as a time series database for

cluster-wide monitoring / performance data.

• Yahoo ( www.yahoo.com ): Yahoo uses HBase to store document ingerprints

for detecting near-duplications. With millions of rows in the HBase table,

Yahoo runs a query for inding duplicated documents with real-time trafic.

The source for the preceding mentioned information is http://

wiki.apache.org/hadoop/Hbase/PoweredBy .

Installing HBase

HBase is an Apache project and the current Version, 0.98.7, of HBase is available as

a stable release. HBase Version 0.98.7 supersedes Version 0.94.x and 0.96.x.

This topic only focuses on HBase Version 0.98.7 , as this version is fully

supported and tested with Hadoop Versions 2.x and deprecates the use

of Hadoop 1.x.

Hadoop 2.x is much faster compared to Hadoop 1.x and includes

important bug ixes that will improve the overall HBase performance.

Older versions, 0.96.x, of HBase which are now extinct, supported both

versions of Hadoop (1.x and 2.x). The HBase version prior to 0.96.x only

supported Hadoop 1.x.

HBase is written in Java, works on top of Hadoop, and relies on ZooKeeper. A

HBase cluster can be set up in either local or distributed mode. Distributed mode

can further be classiied into either pseudo-distributed or fully distributed mode.

Next Page

HBase Essentials

Search WWH ::

Custom Search

Home