Advanced Pattern Mining - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

compression/approximation, and semantic pattern annotation. Let's take a moment

to consider why this field has generated so much attention. What are some of the

application areas in which frequent pattern mining is useful? This section presents an

overview of applications for frequent pattern mining. We have touched on several appli-

cation areas already, such as market basket analysis and correlation analysis, yet frequent

pattern mining can be applied to many other areas as well. These range from data

preprocessing and classification to clustering and the analysis of complex data.

To summarize, frequent pattern mining is a data mining task that discovers patterns

that occur frequently together and/or have some distinctive properties that distinguish

them from others, often disclosing something inherent and valuable. The patterns may

be itemsets, subsequences, substructures, or values. The task also includes the discov-

ery of rare patterns, revealing items that occur very rarely together yet are of interest.

Uncovering frequent patterns and rare patterns leads to many broad and interesting

applications, described as follows.

Pattern mining is widely used for noise filtering and data cleaning as preprocess-

ing in many data-intensive applications. We can use it to analyze microarray data, for

instance, which typically consists of tens of thousands of dimensions (e.g., representing

genes). Such data can be rather noisy. Frequent pattern data mining can help us dis-

tinguish between what is noise and what isn't. We may assume that items that occur

frequently together are less likely to be random noise and should not be filtered out.

On the other hand, those that occur very frequently (similar to stopwords in text docu-

ments) are likely indistinctive and may be filtered out. Frequent pattern mining can help

in background information identification and noise reduction.

Pattern mining often helps in the discovery of inherent structures and clusters

hidden in the data . Given the DBLP data set, for instance, frequent pattern min-

ing can easily find interesting clusters like coauthor clusters (by examining authors

who frequently collaborate) and conference clusters (by examining the sharing of

many common authors and terms). Such structure or cluster discovery can be used as

preprocessing for more sophisticated data mining.

Although there are numerous classification methods (Chapters 8 and 9), research has

found that frequent patterns can be used as building blocks in the construction of high-

quality classification models, hence called pattern-based classification . The approach

is successful because (1) the appearance of very infrequent item(s) or itemset(s) can be

caused by random noise and may not be reliable for model construction, yet a relatively

frequent pattern often carries more information gain for constructing more reliable

models; (2) patterns in general (i.e., itemsets consisting of multiple attributes) usu-

ally carry more information gain than a single attribute (feature); and (3) the patterns

so generated are often intuitively understandable and easy to explain. Recent research

has reported several methods that mine interesting, frequent, and discriminative pat-

terns and use them for effective classification. Pattern-based classification methods are

introduced in Chapter 9.

Frequent patterns can also be used effectively for subspace clustering in high-

dimensional space . Clustering is challenging in high-dimensional space, where the

distance between two objects is often difficult to measure. This is because such a dis-

tance is dominated by the different sets of dimensions in which the objects are residing.

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home