Database Reference
In-Depth Information
wonder how Hive can support them. The short answer is that indexes work
differently in Hive. For the slightly longer answer, read on.
When an index is created in Hive, it creates a new table to store the indexed
values in. The primary benefit of this is that Hive can load a smaller number
of columns (and thus use less memory and disk resources) to respond to
queries that use those columns. However, this benefit in query performance
comes at the cost of processing the index and the additional storage space
required for it. In addition, unlike indexes in most relational systems, Hive
does not automatically update the index when new data is added to the
indexed table. You are responsible for rebuilding the index as necessary.
To create an index, you use the CREATE INDEX statement. You must
provide the table and columns to use for creating the index. You also need to
providethetypeofindexhandlertouse.AswithmanypartsofHive,indexes
are designed to be extensible, so you can develop your own index handlers
in Java. In the following example, the COMPACT index handler is used:
CREATE INDEX customerIndex
ON TABLE customer (state)
AS 'COMPACT'
WITH DEFERRED REBUILD
IN TABLE customerIndexTable;
Another option for the index handler is BITMAP . This handler creates
bitmapindexes,whichworkwellforcolumnsthatdon'thavealargenumber
of distinct values.
The index creation also specifies the table where the index data will be
placed. This is optional; however, it does make it easier to see what the
index contains. Most of the standard options for CREATE TABLE can also
be specified for the table that holds the index.
The WITH DEFERRED REBUILD clause tells Hive not to populate the index
immediately. Rather, you tell it to begin rebuilding with the ALTER INDEX .
. . REBUILD command:
ALTER INDEX customerIndex
ON TABLE customer
REBUILD;
Search WWH ::




Custom Search