Database Reference
In-Depth Information
Web graph, and not any arbitrary sequence of vertices. This ensures that irrelevant
vertices are less likely to be considered in the process of mining relevant patterns.
An Apriori-like algorithm is used in [ 102 ] for this purpose, except that it is modified
to ensure that the candidates also correspond to paths in the underlying Web graph.
Methods for data preparation of Web traversal patterns are proposed in [ 37 ]. Data
preparation is a key issue in the process of finding the correct traversal patterns, be-
cause Web logs are inherently noisy. The ability to find the correct patterns therefore
depends upon the ability to process these logs properly. The work in [ 37 ] provides
an excellent overview of methods for processing these logs. Other methods for Web
log usage mining are discussed in [ 36 , 108 , 110 , 119 , 129 , 138 ].
A useful application of association rule mining is that of personalization . Person-
alization is a very natural application of association rule mining, because correlations
between user behavior can be used in order to group their interests and perform recom-
mendations. Methods for using associations in order to perform recommendations are
discussed in [ 100 , 101 ]. Recommendations can also be viewed as supervised learning
problems, which can be effectively solved with the use of rule-based methods.
7.2
Web Linkage Mining
In Web linkage mining methods, the structure of the Web graph is mined for patterns,
rather than the user traversal patterns. Mining the Web graph for patterns is closely
related to the problem of community detection on the Web graph. In fact, such
an approach can also be used for other kinds of large scale graphs such as social
networks. Frequent patterns can be used to compress very large scale graphs, and
then use the compressed representation for clustering. Such an approach has been
proposed in [ 26 ] for mining communities with the use of compressed patterns. The
method known as VirtualNodeMiner achieves graphs compression by generating
virtual nodes from frequent itemsets in vertex adjacency lists. Another algorithm,
which is focussed on mining frequent patterns from massive networks in the gApprox
algorithm [ 31 ]. The key in this approach is that the approximation process allows
the creation of an anti-monotonicity constraint, which can be pushed into the mining
process. Another method has also been proposed in [ 10 ] for mining communities
from multiple graphs (rather than a single large graph) with the use of frequent
patterns, though this method is designed for smaller graphs, and not particularly
focussed on the scenario of the World Wide Web.
8
Frequent Patterns for Text Mining
Frequent patterns have significant applications to text mining, both in terms of po-
sitional and non-positional co-occurrence. Positional co-occurrence corresponds to
scenarios in which certain words co-occur together from a perspective of adjacency
Search WWH ::




Custom Search