Databases Reference
In-Depth Information
they would certainly succeed in improving cell density, it would be impossible for the
query compiler to map predicates on the base columns to the block indexes on the
expression-based columns.
A clever scheme to detect monotonicity is described in the ICDE paper by Malke-
mus et al. [2005]. The technique parses the expression that is used in the expression-
based column and does a quick analysis of the operand types and sequence to determine
if the expression can be reliably asserted as monotonic.
Malkemus et al. cite the following examples of real-world data uses where coarsifi-
cation has been used to help improve the creation of dense and effective MDC dimen-
sions:
Time attribute used in 30-minute intervals. The timestamp attribute can be
made into a large integer and then divided to obtain this generated column.
Latitude divided by 15 (i.e., intervals of 15 degrees).
Longitude divided by 30.
Year and month of a date (i.e., INT(date) = 100).
Mathematical functions to compute rank, error value, range, etc.
Once the generated column is detected as monotonic, the query rewrite engine of the
SQL compiler will rewrite the SQL statement to use the expression-based column
instead of the base column.
8.6.3 Picking the Right Dimensions
Many factors affect the choice of clustering design, including the database schema,
workload, relative importance of cluster-based roll-out and utility maintenance by the
enterprise, impact of multidimensional clustering on storage requirements, and so on.
MDC benefits come mostly from reducing the I/O performed during query and roll-
out operations. To select MDC dimensions to help query performance, begin by
determining the most expensive (long running) queries in the workload or applica-
tion. Once the longest-running queries have been identified, examine each statement
for the predicates that use equality, range, or general inequality conditions. Equality
predicates are specified by '=' or 'IN' conditions. Range predicates are specified by '>',
'<', '>=', '<=', or 'BETWEEN' clauses in SQL. The columns used in these clauses will
provide the key candidates for clustering. This is not very different than what might
be done when selecting indexes. What is slightly different here is the following:
You rarely need to be concerned about multipart dimensions. Each dimension
can usually be defined as a single column. To achieve multidimensional quali-
ties, simply create n single-part clustering dimensions.
Search WWH ::




Custom Search