Database Reference
In-Depth Information
9.1.1 Types of Data Partitioning
We identify the following variations of data partitioning:
Vertical partitioning: We vertically partition a table T by logically
splitting it into two or more partitions, each one containing all rows
of T for only a subset of columns. To be able to reconstruct the original
table, each partition requires the same key of T to be part of its schema.
For vertical partitioning to be complete, the union of all columns in the
partitions must be the same as the set of columns in the original table.
Figure 9.1b shows an example of a vertical partition of the original ta-
ble R of Figure 9.1a. Since many queries access only a few columns in a
table, vertical partitioning can reduce the amount of data that needs to
be scanned during query processing (this is similar to the case of a sec-
ondary index but with the difference that vertical partitioning replaces
the primary index of a table without redundancy). Queries that refer
to columns in different partitions, however, incur an additional “recon-
struction” cost in the form of a join (Figure 9.1b shows this mechanism
for a simple query).
Horizontal partitioning: We horizontally partition a table T by divid-
ing its tuples among multiple partitions that share the same schema of
T . This division is specified using a partitioning function, which maps
a given row in a table to a partition number. All rows of the table with
the same partition number are stored in the same partition. Current
DBMSs expose two main kinds of partitioning functions. A hash-based
partitioning function is defined by a set of columns C and an integer
value k . For each tuple in the original table, it maps the values of columns
C into a pseudorandom partition number between 1 and k by using a
system-defined hash function. A range-based partitioning function is de-
fined by a set of columns C and an ordered sequence of disjoint ranges
V that cover the domain of C . A tuple with value C
C 0 maps to the
partition associated with the range in V that includes C 0 . Figure 9.1c
shows a range-based horizontal partitioning of the original table R in
Figure 9.1a using column R.a and ranges R.a < 4 , 4 R.a < 6 , and
6 R.a . Horizontal partitioning can help query processing in several
ways. For instance, equality (respectively, range) predicates can leverage
hash (respectively, range) partitioning to perform static partition elim-
ination and process only the relevant partitions (see the sample query
in Figure 9.1c). If both tables in a join are hash partitioned by the join
columns, we can process the join as the union of partition joins (this is
helpful if each partitioned join can be done in parallel). Grouping and
aggregation can also take advantage of hash-based partitioning on the
grouping columns to perform early aggregation.
=
Search WWH ::




Custom Search