Database Reference
In-Depth Information
Our choice of schema is derived from knowing the most efficient way we can read from
HBase. Rows and columns are stored in increasing lexicographical order. Though there
are facilities for secondary indexing and regular expression matching, they come at a per-
formance penalty. It is vital that you understand the most efficient way to query your data
in order to choose the most effective setup for storing and accessing.
For the stations table, the choice of stationid as the key is obvious because we
will always access information for a particular station by its ID. The observations
table, however, uses a composite key that adds the observation timestamp at the end. This
will group all observations for a particular station together, and by using a reverse-order
timestamp ( Long.MAX_VALUE - timestamp ) and storing it as binary, observations
for each station will be ordered with most recent observation first.
NOTE
We rely on the fact that station IDs are a fixed length. In some cases, you will need to zero-pad number
components so row keys sort properly. Otherwise, you will run into the issue where 10 sorts before 2,
say, when only the byte order is considered (02 sorts before 10).
Also, if your keys are integers, use a binary representation rather than persisting the string version of a
number. The former consumes less space.
In the shell, define the tables as follows:
hbase(main):001:0> create 'stations', {NAME => 'info'}
0 row(s) in 0.9600 seconds
hbase(main):002:0> create 'observations', {NAME => 'data'}
0 row(s) in 0.1770 seconds
WIDE TABLES
All access in HBase is via primary key, so the key design should lend itself to how the data is going to be
queried. One thing to keep in mind when designing schemas is that a defining attribute of column(-
family)-oriented stores , such as HBase, is the ability to host wide and sparsely populated tables at no in-
curred cost. [ 139 ]
There is no native database join facility in HBase, but wide tables can make it so that there is no need for
database joins to pull from secondary or tertiary tables. A wide row can sometimes be made to hold all
data that pertains to a particular primary key.
Search WWH ::




Custom Search