The Advancing Art of Indexing - High-Performance Web Databases: Design, Development, and Deployment - page 640

Databases Reference

In-Depth Information

Inverted indexing delivers the broadest range of function and flexibility

for ad hoc data access and analysis. Users can obtain truly interactive ac-

cess to data across the enterprise.

Inverted indexes expedite fast, ad hoc searches of previously undefined

queries. Inverted file indexing allows users to find information based on

any combination of qualifying criteria. Regardless of where the criteria oc-

cur in a field, query results process in seconds — without serial reads or

sequential index.

For example, a user wants to know, “How many customers

in the northeast (NE) region bought a product last quarter at the promo-

tional price?” The traditional index, or B-tree, could quickly identify all the

NE region customers, but would be of no use to also select those that

bought in the last quarter at a particular price. To find those records, the

processor must retrieve the NE region customer records from disk and

evaluate each one for the remaining criteria. If the initial selection yields a

large result — say, several hundred thousand records — the processor

must physically retrieve every record. Next, it must evaluate the transac-

tion date and amount fields for a match to the query criteria. Furthermore,

B-tree indexes are required to scan records byte-by-byte. They can be of no

use when searching for records where the selection criteria is buried with-

in the record, such as an appended product code or first name in a name

field listing lastname-firstname.

An Example.

In contrast, inverted file indexes sort and store all values in indexed

fields. If a table contains sales data with records 1, 5, 13, 22, and 70 repre-

senting the NE region, an inverted index would contain NE with pointers to

records 1, 5, 13, 22, and 70. They select records almost instantly by simply

scanning the index files for the appropriate values and comparing the

record IDs for the shared values — the kind of computation a computer ex-

cels at doing. This process takes place at the index level. Inverted indexes

augment the relational data base to provide the high-performance data ac-

cess that native B-trees cannot.

RELATIONAL INDEXES VERSUS INVERTED INDEXES

Relational data bases offer great retrieval capabilities and flexibility, al-

lowing users to access the data in whatever way they need — unfortunate-

ly, it is often at the cost of performance. Though structured query language

(SQL) contains syntax for the addition and deletion of indexes, no syntax is

included to refer to an index in a query. Therefore, indexed searches are

controlled by the RDBMS and, if available, an optimizer.

When a user submits a query, the RDBMS determines how to resolve the

query, choosing an index, if defined, to improve performance. Without an

index, a sequential search or table scan will be used. The more complex the

Next Page

High-Performance Web Databases: Design, Development, and Deployment

Search WWH ::

Custom Search

Home