Data Cube Technology - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

order, one BC chunk is fully computed for each row scanned. In comparison, the com-

plete computation of one chunk of the second largest 2-D plane, AC , requires scanning

13 chunks, given the ordering from 1 to 64. That is, a 0 c 0 is fully aggregated only after

the scanning of chunks 1, 5, 9, and 13.

Finally, the complete computation of one chunk of the smallest 2-D plane, AB ,

requires scanning 49 chunks. For example, a 0 b 0 is fully aggregated after scanning chunks

1, 17, 33, and 49. Hence, AB requires the longest scan of chunks to complete its com-

putation. To avoid bringing a 3-D chunk into memory more than once, the minimum

memory requirement for holding all relevant 2-D planes in chunk memory, according

to the chunk ordering of 1 to 64, is as follows: 40400 (for the whole AB plane) C

401000 (for one column of the AC plane) C 1001000 (for one BC plane chunk) D

16,000 C 40,000 C 100,000 D 156,000 memory units.

Suppose, instead, that the chunks are scanned in the order 1, 17, 33, 49, 5, 21, 37, 53,

and so on. That is, suppose the scan is in the order of first aggregating toward the AB

plane, and then toward the AC plane, and lastly toward the BC plane. The minimum

memory requirement for holding 2-D planes in chunk memory would be as follows:

4004000 (for the whole BC plane) C 401000 (for one AC plane row) C 10100

(for one AB plane chunk) D 1,600,000 C 40,000 C 1000 D 1,641,000 memory units.

Notice that this is more than 10 times the memory requirement of the scan ordering of

1 to 64.

Similarly, we can work out the minimum memory requirements for the multiway

computation of the 1-D and 0-D cuboids. Figure 5.4 shows the most efficient way to

compute 1-D cuboids. Chunks for 1-D cuboids A and B are computed during the com-

putation of the smallest 2-D cuboid, AB . The smallest 1-D cuboid, A , will have all of

its chunks allocated in memory, whereas the larger 1-D cuboid, B , will have only one

chunk allocated in memory at a time. Similarly, chunk C is computed during the com-

putation of the second smallest 2-D cuboid, AC , requiring only one chunk in memory

at a time. Based on this analysis, we see that the most efficient ordering in this array

cube computation is the chunk ordering of 1 to 64, with the stated memory allocation

strategy.

Example 5.4 assumes that there is enough memory space for one-pass cube compu-

tation (i.e., to compute all of the cuboids from one scan of all the chunks). If there is

insufficient memory space, the computation will require more than one pass through

the 3-D array. In such cases, however, the basic principle of ordered chunk computation

remains the same. MultiWay is most effective when the product of the cardinalities of

dimensions is moderate and the data are not too sparse. When the dimensionality is high

or the data are very sparse, the in-memory arrays become too large to fit in memory, and

this method becomes infeasible.

With the use of appropriate sparse array compression techniques and careful order-

ing of the computation of cuboids, it has been shown by experiments that MultiWay

array cube computation is significantly faster than traditional ROLAP (relational record-

based) computation. Unlike ROLAP, the array structure of MultiWay does not require

saving space to store search keys. Furthermore, MultiWay uses direct array addressing,

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home