Database Reference
In-Depth Information
measure of non-independence. An unexpectedly frequent or infrequent pattern was detected,
and the algorithm generated rules only rather than frequent sequences.
Another important data dependency that can be discovered, using the temporal
characteristics of the data, are similar time sequences (Mannila, Toivonen & Verkamo,
1995; Srikant & Agrawal, 1996b). For example, we may be interested in fi nding common
characteristics of all clients that visited a particular fi le within the time period [t 1 , t 2 ]. On
the contrary, we may be interested in a time interval (within a day or within a week, etc.) in
which a particular fi le is most accessed.
Much work has been done in user behavior analysis. Chen, Park and Yu (1998) explored
to mine path traversal patterns in a distributed information environment, but only one ordered
dimension, the forward referenced pages/URLs accessed, was considered.
Web Usage Mining
In the recent years, there has been an increasing number of research work done in
web usage mining (Yan, Jacobsen, Molina & Dayal, 1996; Cooley, Mobasher & Srivastava,
1997; Chen, Park & Yu, 1998; Wu, Yu & Ballman, 1998; Buchner, Baumgarten, Anand,
Mulvenna & Hughes, 1999; Cooley, Mobasher & Srivastava, 1999; Masseglia, Poncelet &
Cicchetti, 1999; Masseglia, Poncelet & Teisseire, 1999; Masseglia, Poncelet & Teisseire,
2000; Srivastava, Cooley, Deshpande & Tan, 2000).
Most of the existing web analysis tools (Open market web reporter, 1996; Software Inc.
Webtrends, 1995; net.Genesis, 1996) provided mechanisms for reporting user activity in the
servers and various forms of data fi ltering. By using such tools, it is possible to determine
the number of accesses to the server and the individual fi les within the organization's web
space, the times or time intervals of visits, and domain names and the URLs of users of the
web server. However, these tools are designed to deal with low to moderate traffi c servers.
Furthermore, they provide little or no analysis of data relationships among the accessed fi les
and directories within the web space.
More sophisticated systems and techniques for discovery and analysis of patterns are
now emerging. The emerging tools for user pattern discovery use sophisticated techniques
from AI, data mining, psychology, and information theory to mine for knowledge from
collected data. For example, the WEBMINER system (Mobasher, Jain, Han & Srivastava,
1996; Cooley, Mobasher & Srivastava, 1997) introduced a general architecture for web usage
mining. WEBMINER automatically discovered association rules and sequential patterns from
server access logs. Chen, Park and Yu (1996) introduced fi nding maximal forward references
maximal
and large reference sequences. These can be used to perform various types of user traversal
path analysis such as identifying the most traversed paths thorough a web locality.
Once access patterns have been discovered, analysts need the appropriate tools and
techniques to understand, visualize, and interpret these patterns. Examples of such tools include
a WebViz system (Pitkow & Bharat, 1994) for visualizing path traversal patterns. Others have
proposed using OLAP techniques such as data cubes for simplifying the analysis of usage
statistics from server access logs (Dyreson, 1997). The WEBMINER system (Mobasher,
Jain, Han & Srivastava, 1996) proposes an SQL-like query mechanism for querying the
discovered knowledge in association rules and sequential patterns.
maximal forward references
Search WWH ::




Custom Search