Databases Reference
In-Depth Information
Research Laboratory in New York present the idea of MDC describing the technical bene-
fits and merits. I confess that my first reaction was: I don't get it. That's impossible.
Fortunately, it is possible, under the right conditions. In this chapter we'll discuss
what MDC is, how it works, and how to design a database to exploit it. In addition to
its clustering benefits, MDC also produces dramatic improvements in indexing effi-
ciency. The combination of clustering and indexing benefits allows the database
designer to index large volumes of data using three orders of magnitude less storage
than traditional indexes, and the ability to cluster across multiple dimensions simulta-
neously can often achieve performance gains of an order of magnitude.
MDC has been motivated to a large extent by the spectacular growth of relational
data, which has spurred the continual research and development of improved tech-
niques for handling large data sets and complex queries. In particular, online analytical
processing (OLAP) and decision support systems (DSS) have become popular for data
mining and business analysis. OLAP and DSS systems are characterized by multidimen-
sional analysis of compiled enterprise data, and typically include transactional queries
including group-by, aggregation, (multidimensional) range queries, cube, roll-up, and
drill-down.
The performance of multidimensional and single-dimensional queries (especially
those using group-by and range queries) is often dramatically improved through data
clustering, which can significantly reduce input/ouput (I/O) costs, and modestly reduce
CPU costs. Yet the choice of clustering dimensions and the granularity of the clustering
are nontrivial choices and can be difficult to design even for experienced database
designers and industry experts.
MDC techniques have been shown to have very significant performance benefits
for complex workloads. The only current industrial implementation of MDC is in
IBM's DB2 UDB for Linux, UNIX, and Windows. Prior to the IBM implementation
most of the research literature on MDC had focused on how to better design database
storage structures, rather than on how to select the clustering dimensions. In other
words, it has focused on what the database vendors need to create under the covers
within the database management system (DBMS), and not on what the database
administrator (DBA) needs to design. However, for any given storage structure used for
MDC, there are complex design tradeoffs in the selection of the clustering dimensions.
To perform the physical design of MDC tables, it's quite important to understand how
MDC works, why it works, and what pitfalls to watch out for.
8.1 Understanding MDC
8.1.1 Why Clustering Helps So Much
To understand the huge benefit that clustering offers, there are two simple ideas to first
come to terms with:
Search WWH ::




Custom Search