Database Reference
In-Depth Information
also be applied to sequences in the form of methods such as SeqIndex . A variety of
methods such as Grafil [ 133 ] and PIS [ 134 ] have been developed in this context.
7
Web Mining Applications
In these cases, Web logs, linkage patterns and content are processed in order to
determine important frequent and sequential patterns [ 46 , 69 ]. A discussion of fre-
quent pattern mining algorithms for Web log data may be found in [ 65 ]. A variety
of different patterns can be mined from Web data. The key types of Web log mining
correspond to Web log mining , and linkage structure mining . These are described in
the subsections below.
7.1
Web Log Mining
Web logs contain data about user accesses in a standard format. Each log typically
contains the IP address of the accessing host, the time stamp, the Web page accessed,
the referrer, and a few other pieces of meta-information about the data. In such cases,
it is useful to determine frequent access patterns in the logs. Such information can be
very useful for designing the site in order to maximize accesses. Furthermore, Web
log mining can also be used in the context of problems such as anomaly detection, in
which unusual sequences of patterns which do not conform to the normal patterns in
the logs, are determined for outlier analysis. Web log mining has also been used by
educators in order to evaluate and discriminate between learner's access behaviors,
especially in the context of scenarios such as distance-learning. The earliest work
on Web log mining was performed in [ 30 ], in which frequent and sequential pattern
analysis was used for determining important Web log patterns. The algorithm in
this paper distinguishes between forward references and backward references during
traversal by the user over the Web graph. Forward references may correspond to
a user clicking on a Web page to traverse forward, whereas backward references
correspond to the user revisiting the same object. Correspondingly, it defines the
concept of maximal forward references , which correspond to the maximal sequence
of forward traversals. The first step is to use the Web logs in order to create a
database of maximal forward sequences in a pre-processing stage. Subsequently,
frequent pattern mining algorithms are applied to this database in order to determine
the most relevant patterns.
A different method for finding path traversals has been proposed in [ 102 ]. In this
method, the major assumption is that irrelevant patterns may be interleaved with other
more relevant patterns. This work defines a relevant pattern on the basis of the notion
of subpath containment. The algorithm takes into account the underlying graph
structure in order to determine the most relevant patterns. One major difference from
the work in [ 30 ] is that the the candidate patterns need to be paths in the underlying
Search WWH ::




Custom Search