Online Analytical Mining for Web Access Patterns - Advanced Topics in Database Research

Databases Reference

In-Depth Information

Data Warehousing

A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile

collection of data for decision support applications. The construction of a data warehouse

with data cleaning and data integration is viewed as an important preprocessing step for

knowledge discovery tasks.

The proposal of the construction of a large data warehouse for multi-dimensional

analysis is from Codd, who coined the term OLAP for online analytical processing (Codd,

Codd & Salley, 1993). Portions of data warehouses were pre-computed and materialized

for effi cient processing, and such a materialized multidimensional database is called a data

cube (Gray et al., 1997). From the data structure point of view, a data cube is viewed as

a large multi-dimensional array which consists of a set of dimensions with respect to the

analyzed data, and a set of values in each cell called measures (Chaudhuri & Dayal, 1997).

From the operational point of view, a data cube is referred to as a relational operator, which

computes group-by aggregations over all possible subsets of the specifi ed dimensions (Gray

et al., 1997). It treats each of the n aggregated attributes as an n-dimensional sub-cube, or

cuboids. The aggregation of a particular set of attribute values is a point in this space. The

rapid acceptance of this operator has led to a variant of the CUBE being proposed for the

SQL standard.

View Maintenance

The view maintenance problem has been studied extensively (Mohania, Madria &

Kambayashi, 1999; Zhuge, Molina, Hammer & Widom, 1995; Griffi n & Libkin, 1995;

Roussopoulos, 1997; Yang, Karlapalem & Li, 1997) and the recent survey of view maintenance

literature can be found (Gupta & Mumick, 1995). Ross, Srivastava and Sudarshan (1996)

proposed an exhaustive enumerative algorithm for maintaining a view used for any relational

algebraic expression, and have shown that the maintenance cost of view is reduced by

maintaining a set of additional views along with the original view. Blakeley, Coburn and

Larson (1989) found out whether an update to a base relation can affect a derived relation

or not. They determined when a derived relation could be updated or not. Segev and Park

(1989) considered a problem of maintaining a collection of simple Select-Project views.

They developed a screen test procedure to fi lter out the tuples sent to remote sites. Fong

and Zeng (1997) presented a life cycle of developing a data warehouse as: planning,

data requirement analyzing and modeling, analytical database design, data mapping and

transformation, data extraction and load, automating data management procedures and data

validation and testing.

GENERAL ARCHITECTURE OF OLAM

In this section, we present the design and implementation of the online analytical

mining of path traversal patterns. It is a simple, scalable and effective method for analysis

of web usage. We integrate data mining techniques, and the Sequential FP-growth algorithm

with the following aspects: data warehouse, frame model metadata, view maintainability

and automated/incremental discovery-driven method for data exploration.

Search WWH ::

Custom Search

Home