Online Analytical Mining for Web Access Patterns - Advanced Topics in Database Research

Database Reference

In-Depth Information

Imielinski & Swami, 1993; Agrawal & Srikant, 1994; Brin, Motwani & Silverstein, 1997;

Han & Fu, 1995; Klemettinen, Mannila, Ronkainen, Toivonen & Verkamo, 1994; Miller &

Yang, 1997; Ng, Lakshmanan, Han & Pang, 1998; Park, Chen & Yu, 1995; Srikant & Agrawal,

1995; Srikant & Agrawal, 1996; Savasere, Omiecinski & Navathe, 1995; Srikant, Vu &

Agrawal, 1997; Toivonen, 1996). The association rule is a form of data mining to discover

interesting relationships among attributes in data. The discovered rules help decision support

and business management. An example is that 98% of customers who purchase a computer

and printer also buy a scanner. Since rules are simple, easy to understand, explain and catch

important relationships among data in large databases. No wonder mining association rules

from large data sets has been a popular topic in the recent research of data mining.

The association rule involves several major issues, including effi ciency, scalability,

usability and understandability. In the real world applications, data mining tasks are applied

to data consisting of millions of tuples. Consequently, our fi rst concern is the effi ciency and

scalability of association rules in large databases to reduce the computational complexity

of the intensive data processing. Thus an essential issue in the association rule is to locate

its effective algorithms.

The Frequent Pattern Growth (FP-growth) algorithm is one of the association rule

algorithms which locates frequent itemsets, but unlike Apriori, it avoids the expense of

generating only candidate itemsets. Because FP-growth does not need to examine both

candidate and non-candidate sets and requires only two scans of the database, it is a fast

algorithm for mining association patterns. We will investigate this algorithm in depth in the

algorithm of Sequential FP-growth.

We propose and develop an interesting method, called online analytical mining of

path traversal patterns, which integrates the recently developed data warehouse technology

with an effi cient association mining method. The system stores the derived web user access

paths in a data warehouse and facilitates its view maintainability by frame metadata (Fong

& Huang, 1997). The system updates user access paths patterns with the data warehouse

by the data operation functions in the frame metadata. Whenever a user access path occurs,

the view maintainability is triggered by a constraint class in the frame metadata. The data

warehouse is analyzed on the frequent pattern tree of user access paths on the web site within

a period. The developed method achieves incremental, extensible, and multi-dimensional

association rule mining with high performance.

Association Rules

Association rules are like classifi cation rules. Mining association rule is a form of

data mining used to discover interesting relationships among attributes in those data. This

methodology discovers interesting associations or correlation relationships among a large

set of data, i.e., identifi es sets of attribute-values (predicate or item) that frequently occur

together, and then formulates rules that characterize these relationships. In general, an

association rule indicates that the data occurrences of A 1 , A 2 , …, A i will most likely associate

with the data occurrences of B 1 , B 2 , …, B j .

A 1 , A 2 , …, A i → B 1 , B 2 , …, B j

where A i and B j are predicates or items. Such rules are usually interpreted as, “ When items

A 1 , A 2 , …, A i occur, items B 1 , B 2 , …, B j will occur as well in the same transaction.”

Advanced Topics in Database Research

Search WWH ::

Custom Search

Home