Applications of Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

also be applied to sequences in the form of methods such as SeqIndex . A variety of

methods such as Grafil [ 133 ] and PIS [ 134 ] have been developed in this context.

7

Web Mining Applications

In these cases, Web logs, linkage patterns and content are processed in order to

determine important frequent and sequential patterns [ 46 , 69 ]. A discussion of fre-

quent pattern mining algorithms for Web log data may be found in [ 65 ]. A variety

of different patterns can be mined from Web data. The key types of Web log mining

correspond to Web log mining , and linkage structure mining . These are described in

the subsections below.

7.1

Web Log Mining

Web logs contain data about user accesses in a standard format. Each log typically

contains the IP address of the accessing host, the time stamp, the Web page accessed,

the referrer, and a few other pieces of meta-information about the data. In such cases,

it is useful to determine frequent access patterns in the logs. Such information can be

very useful for designing the site in order to maximize accesses. Furthermore, Web

log mining can also be used in the context of problems such as anomaly detection, in

which unusual sequences of patterns which do not conform to the normal patterns in

the logs, are determined for outlier analysis. Web log mining has also been used by

educators in order to evaluate and discriminate between learner's access behaviors,

especially in the context of scenarios such as distance-learning. The earliest work

on Web log mining was performed in [ 30 ], in which frequent and sequential pattern

analysis was used for determining important Web log patterns. The algorithm in

this paper distinguishes between forward references and backward references during

traversal by the user over the Web graph. Forward references may correspond to

a user clicking on a Web page to traverse forward, whereas backward references

correspond to the user revisiting the same object. Correspondingly, it defines the

concept of maximal forward references , which correspond to the maximal sequence

of forward traversals. The first step is to use the Web logs in order to create a

database of maximal forward sequences in a pre-processing stage. Subsequently,

frequent pattern mining algorithms are applied to this database in order to determine

the most relevant patterns.

A different method for finding path traversals has been proposed in [ 102 ]. In this

method, the major assumption is that irrelevant patterns may be interleaved with other

more relevant patterns. This work defines a relevant pattern on the basis of the notion

of subpath containment. The algorithm takes into account the underlying graph

structure in order to determine the most relevant patterns. One major difference from

the work in [ 30 ] is that the the candidate patterns need to be paths in the underlying

Search WWH ::

Custom Search

Home