Database Reference
In-Depth Information
of occurrence. Such patterns can typically be found either by adapting methods from
sequence pattern mining, or by using constrained frequent pattern mining methods.
The latter case is related to the problem of finding frequent bigrams, trigrams, or
phrases in the underlying data. Such frequent patterns can be used in order to enrich
the underlying text representation for a variety of indexing and mining problems. For
example, clustering and classification algorithms can typically benefit from a richer
feature representation, which contains the frequent phrases in the collection. A spe-
cific example of improved text classification with the use of n -gram representations
is discussed in [ 29 ]. It has been shown [ 19 ] that the expansions of query terms with
relevant phrases can significantly enrich a variety of search applications. Therefore,
frequent patterns can be used in order to expand the search phrases and enhance the
quality of the search. It has been shown in [ 85 ] that such rules can be applied not
only to individual words, but also to the paths in the dependency trees of a parsed
corpus.
Frequent patterns have also been used in order to explore interesting patterns
in text collections in terms of temporal and sequential co-occurrence, especially
when the text arrives in the form of a stream. An example of such an approach is
discussed in [ 42 ], where frequently occurring trends in text phrases are discovered
in conjunction with visualization methods. Phrases whose frequency increases or
decreases over time provide valuable hints about the key trends in the underlying
text streams. Since many forms of social network content and news wire services
provide text streams, such methods can provide useful tools in terms of exploring
the changes in the behavior of the underlying collection. In addition association
rules have been shown to be very useful in providing visual representations of the
underlying text collection [ 42 , 91 , 128 ].
A significant number of applications also exist for mining of frequent patterns
without adjacency constraints. Such frequent patterns can be used for co-clustering
of text documents [ 4 ], or for indexing text documents with the use of conceptual
phrases [ 6 ]. The idea in these methods is that simultaneous discovery of relevant
word patterns and clusters is generally more effective than the discovery of each of
them individually on a global basis. In the context of clustering, numerous methods
have been proposed, which use the frequent itemsets in the text collection [ 21 , 51 , 83 ]
in order to measure the similarities between the documents for the clustering process.
A detailed discussion of many of these applications of frequent pattern mining to text
collections may be found in [ 7 ].
9
Temporal Applications
Temporal applications correspond to scenarios in which the data is presented either
in the form of continuous time series or discrete sequences. The two cases are quite
similar, since continuous time series can be discretized into discrete sequences with
the use of a variety of methods such as SAX [ 86 ]. The SAX method discretizes the
average values in small time windows into a set of discrete values. Subsequently,
Search WWH ::




Custom Search