Database Reference
In-Depth Information
Once we have answers, certain practices are followed to ensure optimal table design.
Some of the design practices are as follows:
• Data for a given column family goes into a single store on HDFS. This store
might consist of multiple HFiles, which eventually get converted to a single
HFile using compaction techniques.
• Columns in a column family are also stored together on the disk, and
the columns with different access patterns should be kept in different
column families.
• If we design tables with fewer columns and many rows (a tall table),
we might achieve O(1) operations but also compromise with atomicity.
• Access patterns should be completed in a single API call. Multiple calls
are not a good sign of design.
We not only need to design the table schema to store data in a column-family layout
but also consider the read/write pattern for the table, that is, how the application is
going to access the data from an HBase table. Similarly, rowkeys should be designed
based on the access patterns, as regions represent a range of rows based on the
rowkeys and the HFiles store the rows sorted on the disk. Hence, the rowkey is a
crucial element to the performance of I/O interactions with HBase.
HBase doesn't support cross-row transactions, so the client code
should avoid any kind of transactional logic to support simplicity.
HBase drives the design from BigTable of Google as one-row-per-account which
might easily hold multiple terabytes in a single row with no problems or with a poor
design. However, the same information can also be stored in a tall table (lots of rows
with fewer columns), which also provide performance beneits. This performance
beneit also comes with a cost of atomicity. The physical storage for both the table
designs is essentially the same.
 
Search WWH ::




Custom Search