Database Reference
In-Depth Information
Row columns are grouped into column families . All column family members have a com-
mon prefix, so, for example, the columns info:format and info:geo are both mem-
bers of the info column family, whereas contents:image belongs to the con-
tents family. The column family prefix must be composed of printable characters. The
qualifying tail, the column family qualifier , can be made of any arbitrary bytes. The
column family and the qualifier are always separated by a colon character ( : ).
A table's column families must be specified up front as part of the table schema defini-
tion, but new column family members can be added on demand. For example, a new
column info:camera can be offered by a client as part of an update, and its value per-
sisted, as long as the column family info already exists on the table.
Physically, all column family members are stored together on the filesystem. So although
earlier we described HBase as a column-oriented store, it would be more accurate if it
were described as a column -family -oriented store. Because tuning and storage specifica-
tions are done at the column family level, it is advised that all column family members
have the same general access pattern and size characteristics. For the photos table, the im-
age data, which is large (megabytes), is stored in a separate column family from the
metadata, which is much smaller in size (kilobytes).
In synopsis, HBase tables are like those in an RDBMS, only cells are versioned, rows are
sorted, and columns can be added on the fly by the client as long as the column family
they belong to preexists.
Regions
Tables are automatically partitioned horizontally by HBase into regions . Each region com-
prises a subset of a table's rows. A region is denoted by the table it belongs to, its first row
(inclusive), and its last row (exclusive). Initially, a table comprises a single region, but as
the region grows it eventually crosses a configurable size threshold, at which point it splits
at a row boundary into two new regions of approximately equal size. Until this first split
happens, all loading will be against the single server hosting the original region. As the
table grows, the number of its regions grows. Regions are the units that get distributed
over an HBase cluster. In this way, a table that is too big for any one server can be carried
by a cluster of servers, with each node hosting a subset of the table's total regions. This is
also the means by which the loading on a table gets distributed. The online set of sorted
regions comprises the table's total content.
Search WWH ::




Custom Search