Databases Reference
In-Depth Information
that can be generated (including the cuboids generated by climbing up the hierarchies
along each dimension) is
n Y
i D1 .
Total number of cuboids D
L i C1
/
,
(4.1)
where L i is the number of levels associated with dimension i . One is added to L i in
Eq. (4.1) to include the virtual top level, all . (Note that generalizing to all is equivalent to
the removal of the dimension.)
This formula is based on the fact that, at most, one abstraction level in each dimen-
sion will appear in a cuboid. For example, the time dimension as specified before has
four conceptual levels, or five if we include the virtual level all . If the cube has 10 dimen-
sions and each dimension has five levels (including all ), the total number of cuboids
that can be generated is 5 10 9.810 6 . The size of each cuboid also depends on the
cardinality (i.e., number of distinct values) of each dimension. For example, if the All-
Electronics branch in each city sold every item, there would bej city jj item jtuples in the
city item group-by alone. As the number of dimensions, number of conceptual hierar-
chies, or cardinality increases, the storage space required for many of the group-by's will
grossly exceed the (fixed) size of the input relation.
By now, you probably realize that it is unrealistic to precompute and materialize all
of the cuboids that can possibly be generated for a data cube (i.e., from a base cuboid).
If there are many cuboids, and these cuboids are large in size, a more reasonable option
is partial materialization ; that is, to materialize only some of the possible cuboids that
can be generated.
Partial Materialization: Selected Computation
of Cuboids
There are three choices for data cube materialization given a base cuboid:
1. No materialization : Do not precompute any of the “nonbase” cuboids. This leads
to computing expensive multidimensional aggregates on-the-fly, which can be extre-
mely slow.
2. Full materialization : Precompute all of the cuboids. The resulting lattice of com-
puted cuboids is referred to as the full cube . This choice typically requires huge
amounts of memory space in order to store all of the precomputed cuboids.
3. Partial materialization : Selectively compute a proper subset of the whole set of pos-
sible cuboids. Alternatively, we may compute a subset of the cube, which contains
only those cells that satisfy some user-specified criterion, such as where the tuple
count of each cell is above some threshold. We will use the term subcube to refer to
the latter case, where only some of the cells may be precomputed for various cuboids.
Partial materialization represents an interesting trade-off between storage space and
response time.
 
Search WWH ::




Custom Search