An Algorithm for Mining Weighted Dense Maximal 1-Complete Regions - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

3.4 Optimizing Techniques

In this section, we present several methods to optimize the time complexity of

the basic algorithm. The data is stored as bit strings, that is, each attribute

is represented as a string of 0 and 1. The major operation of our algorithm

is bit intersection. When the percentage of 1 entries in the dataset is larger

than 10%, using bit strings not only saves memory space, it also makes the

computations more e cient.

Reuse Previous Results in Computing

O

The most expensive op eration in Function findnext i s at line 5, where we

need to compute the S = S

a o ). Notice

that for any node in the prefix tree as shown in Fig. 2, its object set can be

computed incrementally from the object set of its parent. That is, the object

of the child node is the intersection of the object set of the parent node and

ψ ( a o ), where a o is the newly added attribute. For example, the object set of

cd can be computed by taking the intersection of the object set of its parent

node c (

⊕

a o and its object set O = ψ ( S

⊕

). So we can maintain the object sets

of all the nodes on the current branch of the search tree on a stack called

curPath to avoid duplicated intersection operations.

However, when the search moves from one branch to the other, the stack

curpath needs to be updated to maintain the correctness of the object set

computation. For example, after we visited node ad , the next node to be

visited is ac . But the object set of ac can not be incrementally computed

based on the object set of ad , while it can be computed incrementally based

on the object set of a . So we maintain another stack of attribute id called

istack , which keeps track of all the attribute id for which S

{

123467

}

)and ψ ( d )(

{

12467

}

o S

⊕

a o is

true. For example, after we find that the next closed subspace after ϕ

ψ ( φ )

is <{ 123467 },c> , we push the object set into curPath and we push c into

stack istack . When we try to find the next closed subspace after c ,wecheck

if o is larger than the top of istack . If yes, that means that we are still on

the same branch of the search tree, so there is no need to change the stack; if

no, that means that we are jumping to a different branch, so pop up all the

elements in iStack that is larger than o . When popping out the elements in

iStack , curPath is also updated in the similar fashion. That is, whenever pop

out an element from iStack , we also pop out an element from curPath .

◦

Stack of Unpromising Nodes

Observe the search tree in Fig. 2. Starting from node φ , we first check if

φ

, we know that any closed subspaces that contain

d must also contain c . So, after we reach node

d φ

⊕

d .Since φ

⊕

d =

{

cd

}

{

a

}

, there is no need to check

{

, since we know for sure that it can not be closed. For this type of pruning,

we maintain a stack called prelistStack . This stack contains elements called

ad

}

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home