Database Reference
In-Depth Information
Introducing HBase
A relational database management system ( RDBMS ) is the right choice for most of
the online transactional processing ( OLTP ) applications, and it also supports most
of the online analytical processing ( OLAP ) systems. Large OLAP systems usually
run very large queries that scan a wide set of records or an entire dataset containing
billions of records (terabytes or petabytes in size) and face scaling issues. To address
scaling issues using RDBMS, a huge investment becomes another point of concern.
The world of Big Data
Since the last decade, the amount of data being created is more than 20 terabytes
per second and this size is only increasing. Not only volume and velocity but this
data is also of a different variety, that is, structured and semi structured in nature,
which means that data might be coming from blog posts, tweets, social network
interactions, photos, videos, continuously generated log messages about what users
are doing, and so on. Hence, Big Data is a combination of transactional data and
interactive data. This large set of data is further used by organizations for decision
making. Storing, analyzing, and summarizing these large datasets eficiently and
cost effectively have become among the biggest challenges for these organizations.
In 2003, Google published a paper on the scalable distributed ilesystem titled
Google File System (GFS), which uses a cluster of commodity hardware to store
huge amounts of data and ensure high availability by using the replication of data
between nodes. Later, Google published an additional paper on processing large,
distributed datasets using MapReduce ( MR ).
For processing Big Data, platforms such as Hadoop, which inherits the basics
from both GFS and MR, were developed and contributed to the community.
A Hadoop-based platform is able to store and process continuously growing
data in terabytes or petabytes.
Search WWH ::




Custom Search