Biology Reference
In-Depth Information
need regenerate the concepts again and again. This can be easily accomplished
by considering the right siblings only in the procedure S PROUT , i.e. changing the
line 3 to for i
nbr ( a ) AND i>s do , while the other parts of the algorithm
remain the same. Depending on the lattice structure, this can significantly speed
up the algorithm as the number of siblings is decreasing in a cascading fashion. A
more careful analysis is needed for the running time of this algorithm.
8.5.2. Algorithm 3: Constructing a Closed Itemset Lattice
In data mining, one is interested in large concepts, i.e. ( obj ( X ) ,X ) where
| obj ( X )
is larger than a threshold. Although our algorithm can naturally be mod-
ified to construct such a closed itemset lattice: we stop processing a concept when
the size of its object set is less than the given threshold, where objects correspond
to transactions and attributes correspond to items. Theoretically, when the mem-
ory requirement is not a concern, our algorithm is faster than all other existing
algorithms (including the state-of-art program CHARM-L) for constructing such
a frequent closed itemset lattice. However, in practice, for large data sets (as those
studied in data mining), the data structure - a trie on objects (transactions) - re-
quires huge memory and this may threaten the algorithm's practical efficiency.
However, it is not difficult to modify our algorithm so that a trie on attributes
(items) instead is used. Recall that a trie on objects are required in two steps of
our algorithm: testing the closure of an attribute set and testing the existence of
a concept. As noted above, the existence of a concept can also be tested on its
intent (i.e. attributes), thus we can use a trie on attributes for testing the exis-
tence of a concept. To avoid using a trie on objects for testing the closure of an
attribute set, we can use the first characterization in Proposition 8.3 instead, that
is, we test the closure of an attribute set by using subset testing of its object set
against its siblings' object set, as described in Section 8.3. Further, we can employ
the practically efficient technique diffset as in CHARM(-L) for both our S PROUT
procedure and subset testing operations. We are testing the performance of the
diffset based implementation on the available benchmarks and the results will be
reported elsewhere.
|
8.6. Discussion
Our interest in FCA stems from our research in microarray data analysis [8]. We
have implemented an not yet optimized version of our algorithm (with less than
500 effective lines in C++). The program is very efficient for our applications, in
which our data consists of about 10000 objects and 29 attributes. It took less than 1
Search WWH ::




Custom Search