Adding Structure with Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

wonder how Hive can support them. The short answer is that indexes work

differently in Hive. For the slightly longer answer, read on.

When an index is created in Hive, it creates a new table to store the indexed

values in. The primary benefit of this is that Hive can load a smaller number

of columns (and thus use less memory and disk resources) to respond to

queries that use those columns. However, this benefit in query performance

comes at the cost of processing the index and the additional storage space

required for it. In addition, unlike indexes in most relational systems, Hive

does not automatically update the index when new data is added to the

indexed table. You are responsible for rebuilding the index as necessary.

To create an index, you use the CREATE INDEX statement. You must

provide the table and columns to use for creating the index. You also need to

providethetypeofindexhandlertouse.AswithmanypartsofHive,indexes

are designed to be extensible, so you can develop your own index handlers

in Java. In the following example, the COMPACT index handler is used:

CREATE INDEX customerIndex

ON TABLE customer (state)

AS 'COMPACT'

WITH DEFERRED REBUILD

IN TABLE customerIndexTable;

Another option for the index handler is BITMAP . This handler creates

bitmapindexes,whichworkwellforcolumnsthatdon'thavealargenumber

of distinct values.

The index creation also specifies the table where the index data will be

placed. This is optional; however, it does make it easier to see what the

index contains. Most of the standard options for CREATE TABLE can also

be specified for the table that holds the index.

The WITH DEFERRED REBUILD clause tells Hive not to populate the index

immediately. Rather, you tell it to begin rebuilding with the ALTER INDEX .

. . REBUILD command:

ALTER INDEX customerIndex

ON TABLE customer

REBUILD;

Search WWH ::

Custom Search

Home