Multidimensional Clustering - Physical Database Design

Databases Reference

In-Depth Information

they would certainly succeed in improving cell density, it would be impossible for the

query compiler to map predicates on the base columns to the block indexes on the

expression-based columns.

A clever scheme to detect monotonicity is described in the ICDE paper by Malke-

mus et al. [2005]. The technique parses the expression that is used in the expression-

based column and does a quick analysis of the operand types and sequence to determine

if the expression can be reliably asserted as monotonic.

Malkemus et al. cite the following examples of real-world data uses where coarsifi-

cation has been used to help improve the creation of dense and effective MDC dimen-

sions:

•

Time attribute used in 30-minute intervals. The timestamp attribute can be

made into a large integer and then divided to obtain this generated column.

•

Latitude divided by 15 (i.e., intervals of 15 degrees).

•

Longitude divided by 30.

•

Year and month of a date (i.e., INT(date) = 100).

•

Mathematical functions to compute rank, error value, range, etc.

Once the generated column is detected as monotonic, the query rewrite engine of the

SQL compiler will rewrite the SQL statement to use the expression-based column

instead of the base column.

8.6.3 Picking the Right Dimensions

Many factors affect the choice of clustering design, including the database schema,

workload, relative importance of cluster-based roll-out and utility maintenance by the

enterprise, impact of multidimensional clustering on storage requirements, and so on.

MDC benefits come mostly from reducing the I/O performed during query and roll-

out operations. To select MDC dimensions to help query performance, begin by

determining the most expensive (long running) queries in the workload or applica-

tion. Once the longest-running queries have been identified, examine each statement

for the predicates that use equality, range, or general inequality conditions. Equality

predicates are specified by '=' or 'IN' conditions. Range predicates are specified by '>',

'<', '>=', '<=', or 'BETWEEN' clauses in SQL. The columns used in these clauses will

provide the key candidates for clustering. This is not very different than what might

be done when selecting indexes. What is slightly different here is the following:

•

You rarely need to be concerned about multipart dimensions. Each dimension

can usually be defined as a single column. To achieve multidimensional quali-

ties, simply create n single-part clustering dimensions.

Search WWH ::

Custom Search

Home