Databases Reference
In-Depth Information
compression/approximation, and semantic pattern annotation. Let's take a moment
to consider why this field has generated so much attention. What are some of the
application areas in which frequent pattern mining is useful? This section presents an
overview of applications for frequent pattern mining. We have touched on several appli-
cation areas already, such as market basket analysis and correlation analysis, yet frequent
pattern mining can be applied to many other areas as well. These range from data
preprocessing and classification to clustering and the analysis of complex data.
To summarize, frequent pattern mining is a data mining task that discovers patterns
that occur frequently together and/or have some distinctive properties that distinguish
them from others, often disclosing something inherent and valuable. The patterns may
be itemsets, subsequences, substructures, or values. The task also includes the discov-
ery of rare patterns, revealing items that occur very rarely together yet are of interest.
Uncovering frequent patterns and rare patterns leads to many broad and interesting
applications, described as follows.
Pattern mining is widely used for noise filtering and data cleaning as preprocess-
ing in many data-intensive applications. We can use it to analyze microarray data, for
instance, which typically consists of tens of thousands of dimensions (e.g., representing
genes). Such data can be rather noisy. Frequent pattern data mining can help us dis-
tinguish between what is noise and what isn't. We may assume that items that occur
frequently together are less likely to be random noise and should not be filtered out.
On the other hand, those that occur very frequently (similar to stopwords in text docu-
ments) are likely indistinctive and may be filtered out. Frequent pattern mining can help
in background information identification and noise reduction.
Pattern mining often helps in the discovery of inherent structures and clusters
hidden in the data . Given the DBLP data set, for instance, frequent pattern min-
ing can easily find interesting clusters like coauthor clusters (by examining authors
who frequently collaborate) and conference clusters (by examining the sharing of
many common authors and terms). Such structure or cluster discovery can be used as
preprocessing for more sophisticated data mining.
Although there are numerous classification methods (Chapters 8 and 9), research has
found that frequent patterns can be used as building blocks in the construction of high-
quality classification models, hence called pattern-based classification . The approach
is successful because (1) the appearance of very infrequent item(s) or itemset(s) can be
caused by random noise and may not be reliable for model construction, yet a relatively
frequent pattern often carries more information gain for constructing more reliable
models; (2) patterns in general (i.e., itemsets consisting of multiple attributes) usu-
ally carry more information gain than a single attribute (feature); and (3) the patterns
so generated are often intuitively understandable and easy to explain. Recent research
has reported several methods that mine interesting, frequent, and discriminative pat-
terns and use them for effective classification. Pattern-based classification methods are
introduced in Chapter 9.
Frequent patterns can also be used effectively for subspace clustering in high-
dimensional space . Clustering is challenging in high-dimensional space, where the
distance between two objects is often difficult to measure. This is because such a dis-
tance is dominated by the different sets of dimensions in which the objects are residing.
 
Search WWH ::




Custom Search