Databases Reference
In-Depth Information
3.4 Optimizing Techniques
In this section, we present several methods to optimize the time complexity of
the basic algorithm. The data is stored as bit strings, that is, each attribute
is represented as a string of 0 and 1. The major operation of our algorithm
is bit intersection. When the percentage of 1 entries in the dataset is larger
than 10%, using bit strings not only saves memory space, it also makes the
computations more e cient.
Reuse Previous Results in Computing
O
The most expensive op eration in Function findnext i s at line 5, where we
need to compute the S = S
a o ). Notice
that for any node in the prefix tree as shown in Fig. 2, its object set can be
computed incrementally from the object set of its parent. That is, the object
of the child node is the intersection of the object set of the parent node and
ψ ( a o ), where a o is the newly added attribute. For example, the object set of
cd can be computed by taking the intersection of the object set of its parent
node c (
a o and its object set O = ψ ( S
). So we can maintain the object sets
of all the nodes on the current branch of the search tree on a stack called
curPath to avoid duplicated intersection operations.
However, when the search moves from one branch to the other, the stack
curpath needs to be updated to maintain the correctness of the object set
computation. For example, after we visited node ad , the next node to be
visited is ac . But the object set of ac can not be incrementally computed
based on the object set of ad , while it can be computed incrementally based
on the object set of a . So we maintain another stack of attribute id called
istack , which keeps track of all the attribute id for which S
{
123467
}
)and ψ ( d )(
{
12467
}
o S
a o is
true. For example, after we find that the next closed subspace after ϕ
ψ ( φ )
is <{ 123467 },c> , we push the object set into curPath and we push c into
stack istack . When we try to find the next closed subspace after c ,wecheck
if o is larger than the top of istack . If yes, that means that we are still on
the same branch of the search tree, so there is no need to change the stack; if
no, that means that we are jumping to a different branch, so pop up all the
elements in iStack that is larger than o . When popping out the elements in
iStack , curPath is also updated in the similar fashion. That is, whenever pop
out an element from iStack , we also pop out an element from curPath .
Stack of Unpromising Nodes
Observe the search tree in Fig. 2. Starting from node φ , we first check if
φ
, we know that any closed subspaces that contain
d must also contain c . So, after we reach node
d φ
d .Since φ
d =
{
cd
}
{
a
}
, there is no need to check
{
, since we know for sure that it can not be closed. For this type of pruning,
we maintain a stack called prelistStack . This stack contains elements called
ad
}
 
Search WWH ::




Custom Search