Online Analytical Mining for Web Access Patterns - Advanced Topics in Database Research

Database Reference

In-Depth Information

measure of non-independence. An unexpectedly frequent or infrequent pattern was detected,

and the algorithm generated rules only rather than frequent sequences.

Another important data dependency that can be discovered, using the temporal

characteristics of the data, are similar time sequences (Mannila, Toivonen & Verkamo,

1995; Srikant & Agrawal, 1996b). For example, we may be interested in fi nding common

characteristics of all clients that visited a particular fi le within the time period [t 1 , t 2 ]. On

the contrary, we may be interested in a time interval (within a day or within a week, etc.) in

which a particular fi le is most accessed.

Much work has been done in user behavior analysis. Chen, Park and Yu (1998) explored

to mine path traversal patterns in a distributed information environment, but only one ordered

dimension, the forward referenced pages/URLs accessed, was considered.

Web Usage Mining

In the recent years, there has been an increasing number of research work done in

web usage mining (Yan, Jacobsen, Molina & Dayal, 1996; Cooley, Mobasher & Srivastava,

1997; Chen, Park & Yu, 1998; Wu, Yu & Ballman, 1998; Buchner, Baumgarten, Anand,

Mulvenna & Hughes, 1999; Cooley, Mobasher & Srivastava, 1999; Masseglia, Poncelet &

Cicchetti, 1999; Masseglia, Poncelet & Teisseire, 1999; Masseglia, Poncelet & Teisseire,

2000; Srivastava, Cooley, Deshpande & Tan, 2000).

Most of the existing web analysis tools (Open market web reporter, 1996; Software Inc.

Webtrends, 1995; net.Genesis, 1996) provided mechanisms for reporting user activity in the

servers and various forms of data fi ltering. By using such tools, it is possible to determine

the number of accesses to the server and the individual fi les within the organization's web

space, the times or time intervals of visits, and domain names and the URLs of users of the

web server. However, these tools are designed to deal with low to moderate traffi c servers.

Furthermore, they provide little or no analysis of data relationships among the accessed fi les

and directories within the web space.

More sophisticated systems and techniques for discovery and analysis of patterns are

now emerging. The emerging tools for user pattern discovery use sophisticated techniques

from AI, data mining, psychology, and information theory to mine for knowledge from

collected data. For example, the WEBMINER system (Mobasher, Jain, Han & Srivastava,

1996; Cooley, Mobasher & Srivastava, 1997) introduced a general architecture for web usage

mining. WEBMINER automatically discovered association rules and sequential patterns from

server access logs. Chen, Park and Yu (1996) introduced fi nding maximal forward references

maximal

and large reference sequences. These can be used to perform various types of user traversal

path analysis such as identifying the most traversed paths thorough a web locality.

Once access patterns have been discovered, analysts need the appropriate tools and

techniques to understand, visualize, and interpret these patterns. Examples of such tools include

a WebViz system (Pitkow & Bharat, 1994) for visualizing path traversal patterns. Others have

proposed using OLAP techniques such as data cubes for simplifying the analysis of usage

statistics from server access logs (Dyreson, 1997). The WEBMINER system (Mobasher,

Jain, Han & Srivastava, 1996) proposes an SQL-like query mechanism for querying the

discovered knowledge in association rules and sequential patterns.

maximal forward references

Advanced Topics in Database Research

Search WWH ::

Custom Search

Home