Databases Reference
In-Depth Information
best predicate might be {circa_asr_flag IN ('Y', 'N')} when it is
uncertain if the version is currently asserted. With this predicate,
the index can support searches for past assertions as well as
searches for current ones. Otherwise, it will require a separate
index to support searches for past assertions.
Clustering
Clustering and partitioning often go together, depending on
the reason for partitioning and the way in which specific DBMSs
support it. Whether or not partitioning is used, choosing the best
clustering sequence can dramatically reduce I/O and improve
performance.
The general concept behind clustering is that as the database
is modified, the DBMS will attempt to keep the data on physical
pages in the same order as that specified in the clustering index.
But each DBMS does this a little differently. One DBMS will clus-
ter each time an insert or update is processed. Another will make
a valiant attempt to do that. A third will only cluster when the
table is reorganized. But regardless of the approach, the result
is to reduce physical I/O by locating data that is frequently
accessed together as physically close together as possible.
Early DBMSs only allowed one clustering index, but newer
releases often support multiple clustering sequences, sometimes
called indexed views or multi-dimensional clustering.
It is important to determine the most frequently used access
paths to the data. Often the most frequently used access paths
are ones based on one or more foreign keys. For asserted version
tables, currently asserted current versions are usually the most
frequently queried data.
Sometimes, the right combination of foreign keys can provide
good clustering for more than one access path. For example,
suppose that a policy table has two low cardinality TFKs, prod-
uct type and market segment, and that each TFK value has
thousands of related policies. 3 We might then create this cluster-
ing index:
{circa_asr_flag, product_type_oid, market_segment_oid,
eff_end_dt, policy_oid}
The circa flag would cluster most of the currently asserted
rows together, keeping them physically co-located under the
lower cardinality columns. Clustering would continue based on
3 Low cardinality means that there are fewer distinct values for the field in the table
which results in more rows having a single value.
Search WWH ::




Custom Search