Databases Reference
In-Depth Information
Meetup ( www.meetup.com ) is a popular site that facilitates user groups and interest groups to
organize local events and meetings. Meetup has grown from a small, unknown site in 2001
to 8 million members in 100 countries, 65,000+ organizers, 80,000+ meetup groups, and 50,000
meetups each week ( http://online.wsj.com/article/SB1000142405274870417040457562473
3792905708.html ). Meetup is an HBase user. All group activity is directly written to HBase and is
indexed per member. A member's custom feed is directly served from HBase.
Facebook is another big user of HBase. Facebook messaging is built on HBase. Facebook was
the number one destination site on the Internet in 2010. It has grown to more than 500 million
active users ( www.facebook.com/press/info.php?statistics ) and is the largest software
application in terms of the number of users. Facebook messaging is a robust infrastructure
that integrates chat, SMS, and e-mail. Hundreds of billions of messages are sent every month
through this messaging infrastructure. The engineering team at Facebook shared a few notes
on using HBase for their messaging infrastructure. Read the notes online at www.facebook
.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919 .
HBase has some inherent advantages when it comes to scaling systems. HBase supports auto load
balancing, failover, compression, and multiple shards per server. HBase works well with the Hadoop
distributed fi lesystem (a.k.a. HDFS, which is a massively scalable distributed fi lesystem). You know
from earlier chapters that HDFS replicates and automatically re-balances to easily accommodate
large fi les that span multiple servers. Facebook chose HBase to leverage many of these features.
HBase is a necessity for handling the number of messages and users they serve. The Facebook
engineering notes also mention that the messages in their infrastructure are short, volatile, and
temporal and are rarely accessed later. HBase, and in general Bigtable clones, are particularly
suitable when ad-hoc querying of data is not important. From earlier chapters, you know that
HBase supports the querying of data sets but is a weak replacement to an RBDMS as far as its
querying capabilities are concerned. Infrastructures like Google App Engine (GAE) successfully
expose a data modeling API, with advanced querying capabilities, on top of the Bigtable. More
information on querying is covered in a section titled “Querying Support,” later in this chapter.
So it seems clear that column-family-centric NoSQL databases are a good choice if extreme
scalability is a requirement. However, such databases may not be the best choice for all types of
systems, especially those that involve real-time transaction processing. An RDBMS often makes
a better choice than any NoSQL fl avor if transactional integrity is very important. Eventually
consistent NoSQL options, like Cassandra or Riak, may be workable if weaker consistency is
acceptable. Amazon has demonstrated that massively scalable e-commerce operations may be a use
case for eventually consistent data stores, but examples beyond Amazon where such models apply
well are hard to fi nd. Databases like Cassandra follow the Amazon Dynamo paradigm and support
eventual consistency. Cassandra promises incredibly fast read and write speeds. Cassandra also
supports Bigtable-like column-family-centric data modeling. Amazon Dynamo also inspired Riak.
Riak supports a document store abstraction in addition to being an eventually consistent store. Both
Cassandra and Riak scale well in horizontal clusters but if scalability is of paramount importance,
my vote goes in favor of HBase or Hypertable over the eventually consistent stores. Perhaps places
where eventually consistent stores fare better than sorted ordered column-family stores is where
write throughput and latency is important. Therefore, if both horizontal scalability and high write
throughput are required, possibly consider Cassandra or Riak. Even in these cases, consider a hybrid
Search WWH ::




Custom Search