Introducing HBase - HBase Essentials

Database Reference

In-Depth Information

Introducing HBase

A relational database management system ( RDBMS ) is the right choice for most of

the online transactional processing ( OLTP ) applications, and it also supports most

of the online analytical processing ( OLAP ) systems. Large OLAP systems usually

run very large queries that scan a wide set of records or an entire dataset containing

billions of records (terabytes or petabytes in size) and face scaling issues. To address

scaling issues using RDBMS, a huge investment becomes another point of concern.

The world of Big Data

Since the last decade, the amount of data being created is more than 20 terabytes

per second and this size is only increasing. Not only volume and velocity but this

data is also of a different variety, that is, structured and semi structured in nature,

which means that data might be coming from blog posts, tweets, social network

interactions, photos, videos, continuously generated log messages about what users

are doing, and so on. Hence, Big Data is a combination of transactional data and

interactive data. This large set of data is further used by organizations for decision

making. Storing, analyzing, and summarizing these large datasets eficiently and

cost effectively have become among the biggest challenges for these organizations.

In 2003, Google published a paper on the scalable distributed ilesystem titled

Google File System (GFS), which uses a cluster of commodity hardware to store

huge amounts of data and ensure high availability by using the replication of data

between nodes. Later, Google published an additional paper on processing large,

distributed datasets using MapReduce ( MR ).

For processing Big Data, platforms such as Hadoop, which inherits the basics

from both GFS and MR, were developed and contributed to the community.

A Hadoop-based platform is able to store and process continuously growing

data in terabytes or petabytes.

Search WWH ::

Custom Search

Home