Databases Reference
In-Depth Information
Why do we need HBase when the data is stored in the HDFS file system, which is the core data
storage layer within Hadoop? For operations other than MapReduce execution and operations that
aren't easy to work with in HDFS, and when you need random access to data, HBase is very useful.
HBase satisfies two types of use cases:
It provides a database-style interface to Hadoop, which enables developers to deploy programs
that can quickly read or write to specific subsets of data in an extremely voluminous data set,
without having to search and process through the entire data set.
It provides a transactional platform for running high-scale, real-time applications as an ACID-
compliant database (meeting standards for atomicity, consistency, isolation, and durability) while
handling the incredible volume, variety, and complexity of data encountered on the Hadoop
platform. HBase supports the following properties of ACID compliance:
Atomicity: All mutations are atomic within a row. For example, a read or write operation will
either succeed or fail.
Consistency: All rows returned for any execution will consist of a complete row that existed or
exists in the table.
Isolation: The isolation level is called “read committed” in the traditional DBMS.
Durability: All visible data in the system is durable data. For example, to phrase durability, a
read will never return data that has not been made durable on disk.
HBase is different from the RDBMS and DBMS platforms and is architected and deployed like
any NoSQL database.
HBase architecture
Data is organized in HBase as rows and columns and tables, very similar to a database; however, here
is where the similarity ends. Let us look at the data model of HBase and then understand the imple-
mentation architecture.
Tables:
Tables are made of rows and columns.
Table cells are the intersection of row and column coordinates. Each cell is versioned by
default with a timestamp. The contents of a cell are treated as an uninterpreted array of bytes.
A table row has a sortable row key and an arbitrary number of columns.
Rows:
Table row keys are also byte arrays. In this configuration anything can serve as the row key as
opposed to strongly typed data types in the traditional database.
Table rows are sorted byte-ordered by row key, the table's primary key, and all table accesses
are via the table primary key.
Columns are grouped as families and a row can have as many columns as loaded.
Columns and column groups (families):
In HBse row columns are grouped into column families.
All column family members will mandatorily have a common prefix, for example, the columns
person:name and person:comments are both members of the person column family, whereas
email:identifier belongs to the email family.
A table's column families must be specified upfront as part of the table schema definition.
New column family members can be added on demand.
Search WWH ::




Custom Search