Applications of Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

of occurrence. Such patterns can typically be found either by adapting methods from

sequence pattern mining, or by using constrained frequent pattern mining methods.

The latter case is related to the problem of finding frequent bigrams, trigrams, or

phrases in the underlying data. Such frequent patterns can be used in order to enrich

the underlying text representation for a variety of indexing and mining problems. For

example, clustering and classification algorithms can typically benefit from a richer

feature representation, which contains the frequent phrases in the collection. A spe-

cific example of improved text classification with the use of n -gram representations

is discussed in [ 29 ]. It has been shown [ 19 ] that the expansions of query terms with

relevant phrases can significantly enrich a variety of search applications. Therefore,

frequent patterns can be used in order to expand the search phrases and enhance the

quality of the search. It has been shown in [ 85 ] that such rules can be applied not

only to individual words, but also to the paths in the dependency trees of a parsed

corpus.

Frequent patterns have also been used in order to explore interesting patterns

in text collections in terms of temporal and sequential co-occurrence, especially

when the text arrives in the form of a stream. An example of such an approach is

discussed in [ 42 ], where frequently occurring trends in text phrases are discovered

in conjunction with visualization methods. Phrases whose frequency increases or

decreases over time provide valuable hints about the key trends in the underlying

text streams. Since many forms of social network content and news wire services

provide text streams, such methods can provide useful tools in terms of exploring

the changes in the behavior of the underlying collection. In addition association

rules have been shown to be very useful in providing visual representations of the

underlying text collection [ 42 , 91 , 128 ].

A significant number of applications also exist for mining of frequent patterns

without adjacency constraints. Such frequent patterns can be used for co-clustering

of text documents [ 4 ], or for indexing text documents with the use of conceptual

phrases [ 6 ]. The idea in these methods is that simultaneous discovery of relevant

word patterns and clusters is generally more effective than the discovery of each of

them individually on a global basis. In the context of clustering, numerous methods

have been proposed, which use the frequent itemsets in the text collection [ 21 , 51 , 83 ]

in order to measure the similarities between the documents for the clustering process.

A detailed discussion of many of these applications of frequent pattern mining to text

collections may be found in [ 7 ].

9

Temporal Applications

Temporal applications correspond to scenarios in which the data is presented either

in the form of continuous time series or discrete sequences. The two cases are quite

similar, since continuous time series can be discretized into discrete sequences with

the use of a variety of methods such as SAX [ 86 ]. The SAX method discretizes the

average values in small time windows into a set of discrete values. Subsequently,

Search WWH ::

Custom Search

Home