Database Reference
In-Depth Information
The Apache Hadoop software library is a framework that allows the
distributed processing of large datasets across clusters of computers.
However, Hadoop is designed to process data in the batch mode and the ability
to access data randomly and near real time is completely missing. In Hadoop,
processing smaller iles has a larger overhead compared to big iles and thus is a bad
choice for low latency queries.
Later, a database solution called NoSQL evolved with multiple lavors, such as a
key-value store, document-based store, column-based store, and graph-based store.
NoSQL databases are suitable for different business requirements. Not only do these
different lavors address scalability and availability but also take care of highly
eficient read/write with data growing ininitely or, in short, Big Data.
The NoSQL database provides a fail-safe mechanism for the storage
and retrieval of data that is modeled in it, somewhat different from
the tabular relations used in many relational databases.
The origin of HBase
Looking at the limitations of GFS and MR, Google approached another solution,
which not only uses GFS for data storage but it is also used for processing the
smaller data iles very eficiently. They called this new solution BigTable.
BigTable is a distributed storage system for managing structured data
that is designed to scale to a very large size: petabytes of data across
thousands of commodity servers.
Welcome to the world of HBase, http://hbase.apache.org/ . HBase is a NoSQL
database that primarily works on top of Hadoop. HBase is based on the storage
architecture followed by the BigTable. HBase inherits the storage design from the
column-oriented databases and the data access design from the keyvalue store
databases where a key-based access to a speciic cell of data is provided.
In column-oriented databases, data grouped by columns and column
values is stored contiguously on a disk. Such a design is highly I/O
effective when dealing with very large data sets used for analytical
queries where not all the columns are needed.
 
Search WWH ::




Custom Search