Multidimensional Clustering - Physical Database Design

Databases Reference

In-Depth Information

Research Laboratory in New York present the idea of MDC describing the technical bene-

fits and merits. I confess that my first reaction was: I don't get it. That's impossible.

Fortunately, it is possible, under the right conditions. In this chapter we'll discuss

what MDC is, how it works, and how to design a database to exploit it. In addition to

its clustering benefits, MDC also produces dramatic improvements in indexing effi-

ciency. The combination of clustering and indexing benefits allows the database

designer to index large volumes of data using three orders of magnitude less storage

than traditional indexes, and the ability to cluster across multiple dimensions simulta-

neously can often achieve performance gains of an order of magnitude.

MDC has been motivated to a large extent by the spectacular growth of relational

data, which has spurred the continual research and development of improved tech-

niques for handling large data sets and complex queries. In particular, online analytical

processing (OLAP) and decision support systems (DSS) have become popular for data

mining and business analysis. OLAP and DSS systems are characterized by multidimen-

sional analysis of compiled enterprise data, and typically include transactional queries

including group-by, aggregation, (multidimensional) range queries, cube, roll-up, and

drill-down.

The performance of multidimensional and single-dimensional queries (especially

those using group-by and range queries) is often dramatically improved through data

clustering, which can significantly reduce input/ouput (I/O) costs, and modestly reduce

CPU costs. Yet the choice of clustering dimensions and the granularity of the clustering

are nontrivial choices and can be difficult to design even for experienced database

designers and industry experts.

MDC techniques have been shown to have very significant performance benefits

for complex workloads. The only current industrial implementation of MDC is in

IBM's DB2 UDB for Linux, UNIX, and Windows. Prior to the IBM implementation

most of the research literature on MDC had focused on how to better design database

storage structures, rather than on how to select the clustering dimensions. In other

words, it has focused on what the database vendors need to create under the covers

within the database management system (DBMS), and not on what the database

administrator (DBA) needs to design. However, for any given storage structure used for

MDC, there are complex design tradeoffs in the selection of the clustering dimensions.

To perform the physical design of MDC tables, it's quite important to understand how

MDC works, why it works, and what pitfalls to watch out for.

8.1 Understanding MDC

8.1.1 Why Clustering Helps So Much

To understand the huge benefit that clustering offers, there are two simple ideas to first

come to terms with:

Search WWH ::

Custom Search

Home