Database Reference
In-Depth Information
At the storage level, all columns in a column family are stored in a
single ile, called HFile, as key-value pairs in the binary format. These
HFiles are ordered immutable maps which are internally represented
as data blocks with a block index.
In HBase, the placeholder for the column value is called cell. Each cell stores the most
recent value and the historical values for the column. These values are placed in a
descending order on the timestamp and ensure a faster read performance.
Each value contained within a cell in the table can be represented by a combination
of the rowkey, column family, column key, and timestamp. The following image of
a table shows the organization of values in the table:
Column Family :: Customer
RowKeys
Name
email
Phone
david@gmail.com
ROW 1
David
982 765 2345
Cell
Cell
Cell
ROW 2
John
john@rediff.com
763 456 1234
Cell
Cell
Cell
Elan
elan@hotmail.com
ROW 3
554 123 0987
Cell
Cell
Cell
763 451 4587
863 341 4123
maria@test.net
ROW 4
Maria
Cell
Cell
Cell
Each Cell may have
multiple version of data
Distinguished by time
stamp
Like column families that group columns, HBase has a concept called regions, where
it groups the continuous range of rows and stores them together at lower levels in
region servers. Regions can also be thought of as data partitions in the RDBMS world
and help to achieve scalability in the overall HBase architecture. A maximum size is
deined for regions, and once the limit is exceeded, the region is split into two from
the middle. This process is synonymous to auto-sharding in the RDBMS world.
In HBase, records are stored in HFiles as key-value pairs, and this HFile is, in turn,
stored as a binary ile. Records from a single column family might be split across
multiple HFiles, but a single HFile cannot contain data for multiple column families.
Search WWH ::




Custom Search