Database Reference
In-Depth Information
Due to the lack of direct support, we can use the following approaches in HBase to
create secondary indexes, which stores a mapping between the new coordinates and
the existing coordinates:
•
Application-managed approach
: This approach suggests that you move the
responsibility completely into the application or client layer. This approach
deals with a data table and one or more lookup/mapping tables. Whenever
the code writes into the data table, it also updates the lookup tables. The
main advantage of this approach is that it provides full control over mapping
the keys as the full logic of mapping is written at the client's end. However,
this liberty also carries a cost: getting some orphaned mappings if any client
process fails; cleaning orphaned mappings (using MapReduce) is another
overhead as lookup/mapping tables also takes cluster space and consumes
processing power.
•
Indexing solutions for HBase
: Other indexing solutions are also present
to provide secondary index support in HBase, such as Lily HBase indexer,
http://ngdata.github.io/hbase-indexer/
. This solution quickly indexes
HBase rows into Solr and provides the ability to easily search for any content
stored in HBase. Such solutions do not require separate tables for each index,
rather they maintain them purely in the memory. These solutions index the
on-disk data, and during searches, only in-memory index related details are
used for data. The main advantage of this solution is that the index is never
out of sync.
HBase provides an advanced feature called coprocessor that can also
be used to achieve a behavior similar to that of secondary indexes. The
coprocessor provides a framework for a flexible and generic extension
for distributed computation directly within the HBase server processes.
HBase table scans
In the previous chapter, we took a look at CRUD operations in HBase. Now, let's take
a step further and discuss table scans in Hbase. In Hbase, table scans are similar to
iterators in Java or nonscrollable cursors in the RDBMS world. The HBase table scans
command is useful for querying the data to access the complete set of records for
a speciic value by applying ilters. Hence, the
scan()
operation reads the deined
portion of data similar to the
get()
operation, and the ilters are applied to the read
portion for narrowing down the results further.