Faster Algorithms for Constructing a Concept (Galois) Lattice - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

need regenerate the concepts again and again. This can be easily accomplished

by considering the right siblings only in the procedure S PROUT , i.e. changing the

line 3 to for i

∈ nbr ( a ) AND i>s do , while the other parts of the algorithm

remain the same. Depending on the lattice structure, this can significantly speed

up the algorithm as the number of siblings is decreasing in a cascading fashion. A

more careful analysis is needed for the running time of this algorithm.

8.5.2. Algorithm 3: Constructing a Closed Itemset Lattice

In data mining, one is interested in large concepts, i.e. ( obj ( X ) ,X ) where

| obj ( X )

is larger than a threshold. Although our algorithm can naturally be mod-

ified to construct such a closed itemset lattice: we stop processing a concept when

the size of its object set is less than the given threshold, where objects correspond

to transactions and attributes correspond to items. Theoretically, when the mem-

ory requirement is not a concern, our algorithm is faster than all other existing

algorithms (including the state-of-art program CHARM-L) for constructing such

a frequent closed itemset lattice. However, in practice, for large data sets (as those

studied in data mining), the data structure - a trie on objects (transactions) - re-

quires huge memory and this may threaten the algorithm's practical efficiency.

However, it is not difficult to modify our algorithm so that a trie on attributes

(items) instead is used. Recall that a trie on objects are required in two steps of

our algorithm: testing the closure of an attribute set and testing the existence of

a concept. As noted above, the existence of a concept can also be tested on its

intent (i.e. attributes), thus we can use a trie on attributes for testing the exis-

tence of a concept. To avoid using a trie on objects for testing the closure of an

attribute set, we can use the first characterization in Proposition 8.3 instead, that

is, we test the closure of an attribute set by using subset testing of its object set

against its siblings' object set, as described in Section 8.3. Further, we can employ

the practically efficient technique diffset as in CHARM(-L) for both our S PROUT

procedure and subset testing operations. We are testing the performance of the

diffset based implementation on the available benchmarks and the results will be

reported elsewhere.

|

8.6. Discussion

Our interest in FCA stems from our research in microarray data analysis [8]. We

have implemented an not yet optimized version of our algorithm (with less than

500 effective lines in C++). The program is very efficient for our applications, in

which our data consists of about 10000 objects and 29 attributes. It took less than 1

Search WWH ::

Custom Search

Home