Database Reference
In-Depth Information
state: “Partitioning in database design is the process of assigning a logical
object (relation) from the logical schema of the database to several physical
objects (files) in a stored database. Vertical partitioning subdivides attributes
into groups and assigns each group to a physical object.” That paper, however,
was not concerned with such analytical applications in which ad hoc queries
are dominant. Instead, it discusses how tables may be partitioned in order
to exploit known correlations between attribute hit rates, to obtain better
average query performance. This is worthwhile mainly in routinely repeating
processes where access patterns that display such correlations dominate and
change slowly.
The term [ fully ] transposed file was used in early papers, 3 , 14 - 16 to de-
note what is today called “vertically fragmented” or “vertically decomposed”
data structures, 17 “vertical partitioning,” 9 “column-oriented” databases, 18 or
“column store” databases. 8
Copeland and Khoshafian 12 is the first published paper on transposed files
and related structures that is widely referenced in recent database literature.
While the authors note that some early database systems used a fully trans-
posed storage model, for example, RM, 19 TOD, 10 RAPID, 14 ALDS, 20 Delta 21
and Tanaka, 22 they described in that paper the advantages of a fully decom-
posed storage model (DSM). A DSM is a “[fully] transposed storage model
with surrogates * included.” In a DSM each column of a relational table is
stored in a separate binary association table (BAT), as an array of fixed-size
two-field records (TID, value), where TID refers to Tuple ID. According to
Khoshafian et al., 23 the DSM further assumes that two copies of each binary
relation are stored, one clustered (i.e., sorted or hashed with an index) on each
of the two attributes (TID, value). Copeland and Khoshafian 12 conclude that
there seems to be a general consensus among the database community that
the conventional N-ary Storage Model (NSM) is better. They suggest that the
consensus opinion is not well founded and that neither is clearly better until
a closer analysis is made.
In Khoshafian et al., 23 a parallel query-processing strategy for the DSM is
presented, called the pivot algorithm . The algorithm makes use of the join in-
dex concept. 24 An informal description of a generic select-join-project query
is given in the paper, where all selects are assumed to be range restriction
operations and all joins are equi-joins. The initial select phase executes a
select operation for every predicate binding in the query, using the appropri-
ate value-clustered BAT as index. The output of this phase is a collection
of temporary index lists, each containing the TIDs of selected tuples from
conceptual relation tables. All these operations are done in parallel. Dur-
ing the pivot phase of the algorithm, the main m -way join operation of the
query is executed. A “pivot” TID column is chosen from those attributes that
* The term surrogate is used by some researchers to denote object identifiers, or as here in
relational databases, tuple identifiers or TIDs . We will use the latter term unless we make a
direct quotation.
Search WWH ::




Custom Search