Database Reference
In-Depth Information
Second, the pattern growth approach has been extended to pattern-based classifica-
tion. Frequent patterns have been demonstrated to be useful for classification, where
association rules are generated and analyzed for effective classification [ 21 , 22 ]by
discovery of strong associations between frequent patterns and class labels. More-
over, a further study [ 10 ] has provided solid reasoning to support the methodology
of frequent pattern-based classification. By building a connection between pattern
frequency and discriminative measures, such as information gain and Fisher score, it
is shown that discriminative frequent patterns are essential for classification, whereas
inclusion of infrequent patterns may not improve the classification accuracy due to
their limited predictive power. A pattern-growth based methodology, called DDP-
Mine [ 11 ], is developed that performs a branch-and-bound search for directly mining
discriminative patterns without generating the complete pattern set. Instead of select-
ing best patterns in a batch, a “feature-centered mining approach is introduced that
generates discriminative patterns sequentially on a progressively shrinking FP-tree by
incrementally eliminating training instances. The instance elimination effectively re-
duces the problem size iteratively and expedites the mining process. Empirical results
show that DDPMine achieves orders of magnitude speedup without downgrading
classification accuracy and outperforms the state-of-the-art associative classification
methods in terms of both accuracy and efficiency.
6
Conclusions
We have presented a pattern-growth methodology for mining multiple kinds of fre-
quent patterns in large databases. Their associated performance studies show that
the algorithms derived from the pattern-growth methodology are more efficient and
scalable than many other frequent pattern mining methods.
According to our analysis, the high performance of the pattern-growth method-
ology is due to the following factors: (1) it adopts a divide-and-conquer strategy to
project and partition a large database recursively into a set of progressively smaller
ones, and the patterns to be searched for in each corresponding projected database are
also reduced substantially; (2) it integrates disk-based database projection algorithms
with main memory-based data structures and fast in-memory traversal algorithms,
which can be well-tuned to achieve combined high performance by swapping disk-
based algorithm into memory-based one when the projected and compressed data
set can fit in memory; and (3) it makes good use of the Apriori property implicitly as
well as other properties, such as the single tree-path property, but avoids generating
a large number of candidates, which ensures each counting and testing is on the
real data sets rather than on the potential candidate sets. These several techniques
combined lead to high performance mining algorithms.
Search WWH ::




Custom Search