Online Analytical Mining for Web Access Patterns - Advanced Topics in Database Research

Database Reference

In-Depth Information

2.

Methodology of implementing OLAM, which includes integration of data mining and

data warehousing techniques into a unifi ed framework that ensures data availability,

fl exibility, and integrated information-processing environment for data analysis.

3.

The resultant cluster of web pages frequently visited by users for marketing use,

which includes identifying potential customers for e-commerce, evolving the web

sites to achieve the business objectives, enhancing the quality and delivery of Internet

information services to the end user, and helping web design to improve the web site

topology.

RELATED WORK

Association Rules Discovery

The concept of association rules was fi rst introduced in Agrawal, Imielinski and

Swami (1993). The problem of data mining for association rule has been studied extensively

(Harinarayan, Rajaraman & Ullman, 1996; Agrawal & Srikant, 1994; Bayardo, 1998;

Cheung, Han, Ng & Wong, 1996; Han, Karypis & Kumar, 1997; Park, Chen & Yu, 1995b;

Savasere, Omiecinski & Navathe, 1995; Fukuda, Morimoto, Morishita & Tokuyama, 1996;

Svawagi, Thomas & Agrawal, 1998). These studies covered a broad range of topics and its

variations have been studied, aimed for further improvements of the performance of the

algorithm. These are fast algorithms based on the Apriori Algorithm (Agrawal & Srikant,

1994), incremental updating and parallel algorithms (Cheung, Han, Ng & Wong, 1996; Park,

Chen & Yu, 1995b; Han, Karypis & Kumar, 1997), and mining of generalized, multi-level

rules, and multi-dimensional rules (Han & Fu, 1995; Zhao, Deshpande & Naughton, 1997). A

hash-based technique was used to reduce the size of the candidate k-itemsets; a scan reduction

technique was used to reduce the number of database scans; and a transaction reduction

technique was used to reduce the number of transactions scanned in future iteration (Park,

Chen & Yu, 1995a). Recently, a strategy based on partitioning the data showed a stronger

effect than the other scan reduction methods to reduce the number of scans required to two

(Savasere, Omiecinski & Navathe, 1995).

Sequential Patterns Mining

The problem of discovering sequential patterns mining is to fi nd inter-transaction

patterns such that the presence of a set of items is followed by another item in the time-

stamp ordered transaction set. It was fi rst introduced by Agrawal and Srikant (1995). The

algorithm AprioriAll was to fi nd all frequent patterns. Later, the same authors (Srikant &

Agrawal, 1996a) presented the GSP algorithm that outperforms AprioriAll by up to 20 times.

The GSP algorithm was a variation of the Apriori algorithm.

Mannila, Toivonen and Verkamo (1995) presented the problem of fi nding frequent

episodes in only one long sequence of events. An episode is defi ned as a set of events occurring

with a partially defi ned order and within a given time bound. They generalized their work

to allow one to express arbitrary unary conditions on the individual event attributes, or to

give binary conditions on the pairs of event attributes. Their experiments were performed

using a web server-level log fi le.

Oates and Cohen (1996) introduced the problem of detecting strong dependencies among

multiple streams of data. Their measure of dependency strength is based on the statistical

Advanced Topics in Database Research

Search WWH ::

Custom Search

Home