Databases Reference
In-Depth Information
How it works...
The hash clusters use a hash function to identify the database block in which the rows with a
particular cluster key value are stored, replacing the index functionalities of index clusters.
In this recipe, we have created the
EMP_DEPT_CLUSTER
cluster to store information related
to
DEPARTMENTS
and
EMPLOYEES
, stored together using
DEPARTMENT_ID
as the cluster key.
We then created and populated the
CL_DEPARTMENTS
and
CL_EMPLOYEES
tables with the
data of the
DEPARTMENTS
and
EMPLOYEES
tables of the
HR
schema.
In step 5 and step 6, we analyzed a simple query against the
EMPLOYEES
table and its
hash-clustered counterpart
CL_EMPLOYEES
, to verify that. In the first query, we are using
the
EMP_DEPARTMENT_IX
index to access the rows corresponding to the department
identified by the ID 20. In the second query, we access only the clustered table segment
with a
TABLE ACCESS HASH
operation.
The performance gain is clear we are not accessing two segments—the index and the table — to
retrieve the queried data, but only the table segment.
There's more...
In hash-clustered tables, we have a greater saving in size, because we do not store any data
outside the rows in the clustered tables—no indexes—and the cluster key values won't be
duplicated as with index clusters.
The con is that we must know in advance the maximum cardinality of the tables, because in
the cluster creating phase, we need to declare the maximum number of different keys with
the
HASHKEYS
parameters. If the table grows over this value, we have to rebuild the cluster.
Someone could think of declaring a big
HASHKEYS
value, but this isn't a good idea. The space
required to store
HASHKEYS
rows in the cluster will be allocated when the cluster is created. If
we have oversized the hash cluster, there is a huge waste of space.
Hash clusters are useful when we know in advance—or can predict—the number of key values,
and when we perform regular queries on clustered tables using the equality predicate on the
cluster key.
Even for hash clusters, if we perform frequent full table scans, and we have to allocate a lot
of space for our growing table, it's better to use regular (heap) tables.
Sorted hash clusters
Beginning with Oracle Database 10
g
, hash clusters can be sorted. We can decide the order
in which the rows are sorted—within the same cluster key value—even on multiple fields.