Advanced Data Modeling - HBase Essentials

Database Reference

In-Depth Information

Due to the lack of direct support, we can use the following approaches in HBase to

create secondary indexes, which stores a mapping between the new coordinates and

the existing coordinates:

• Application-managed approach : This approach suggests that you move the

responsibility completely into the application or client layer. This approach

deals with a data table and one or more lookup/mapping tables. Whenever

the code writes into the data table, it also updates the lookup tables. The

main advantage of this approach is that it provides full control over mapping

the keys as the full logic of mapping is written at the client's end. However,

this liberty also carries a cost: getting some orphaned mappings if any client

process fails; cleaning orphaned mappings (using MapReduce) is another

overhead as lookup/mapping tables also takes cluster space and consumes

processing power.

• Indexing solutions for HBase : Other indexing solutions are also present

to provide secondary index support in HBase, such as Lily HBase indexer,

http://ngdata.github.io/hbase-indexer/ . This solution quickly indexes

HBase rows into Solr and provides the ability to easily search for any content

stored in HBase. Such solutions do not require separate tables for each index,

rather they maintain them purely in the memory. These solutions index the

on-disk data, and during searches, only in-memory index related details are

used for data. The main advantage of this solution is that the index is never

out of sync.

HBase provides an advanced feature called coprocessor that can also

be used to achieve a behavior similar to that of secondary indexes. The

coprocessor provides a framework for a flexible and generic extension

for distributed computation directly within the HBase server processes.

HBase table scans

In the previous chapter, we took a look at CRUD operations in HBase. Now, let's take

a step further and discuss table scans in Hbase. In Hbase, table scans are similar to

iterators in Java or nonscrollable cursors in the RDBMS world. The HBase table scans

command is useful for querying the data to access the complete set of records for

a speciic value by applying ilters. Hence, the scan() operation reads the deined

portion of data similar to the get() operation, and the ilters are applied to the read

portion for narrowing down the results further.

Search WWH ::

Custom Search

Home