Data Cube Technology - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

Table 5.4 Original Database

TID A B C D E

a 1

b 1

c 1

d 1

e 1

a 1

b 2

c 1

d 2

e 1

a 1

b 2

c 1

d 1

e 2

a 2

b 1

c 1

d 1

e 2

a 2

b 1

c 1

d 1

e 3

of high dimensionality and can quickly compute small local cubes online. It explores the

inverted index data structure, which is popular in information retrieval and Web-based

information systems.

The basic idea is as follows. Given a high-dimensional data set, we partition the

dimensions into a set of disjoint dimension fragments , convert each fragment into its

corresponding inverted index representation, and then construct cube shell fragments

while keeping the inverted indices associated with the cube cells. Using the precom-

puted cubes' shell fragments, we can dynamically assemble and compute cuboid cells of

the required data cube online. This is made efficient by set intersection operations on

the inverted indices.

To illustrate the shell fragment approach, we use the tiny database of Table 5.4 as a

running example. Let the cube measure be count() . Other measures will be discussed

later. We first look at how to construct the inverted index for the given database.

Example 5.9 Construct the inverted index. For each attribute value in each dimension, list the tuple

identifiers ( TIDs ) of all the tuples that have that value. For example, attribute value a 2

appears in tuples 4 and 5. The TID list for a 2 then contains exactly two items, namely 4

and 5. The resulting inverted index table is shown in Table 5.5. It retains all the original

database's information. If each table entry takes one unit of memory, Tables 5.4 and 5.5

each takes 25 units, that is, the inverted index table uses the same amount of memory as

the original database.

“How do we compute shell fragments of a data cube?” The shell fragment com-

putation algorithm, Frag-Shells , is summarized in Figure 5.14. We first partition all

the dimensions of the given data set into independent groups of dimensions, called

fragments (line 1). We scan the base cuboid and construct an inverted index for

each attribute (lines 2 to 6). Line 3 is for when the measure is other than the tuple

count() , which will be described later. For each fragment, we compute the full local

(i.e., fragment-based) data cube while retaining the inverted indices (lines 7 to 8).

Consider a database of 60 dimensions, namely, A 1 , A 2 ,

:::

, A 60 . We can first parti-

tion the 60 dimensions into 20 fragments of size 3:

A 1 , A 2 , A 3 /

A 4 , A 5 , A 6 /

:::

. For each fragment, we compute its full data cube while record-

ing the inverted indices. For example, in fragment

A 58 , A 59 , A 60 /

, we would compute

seven cuboids: A 1 , A 2 , A 3 , A 1 A 2 , A 2 A 3 , A 1 A 3 , A 1 A 2 A 3 . Furthermore, an inverted index

A 1 , A 2 , A 3 /

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home