Databases Reference
In-Depth Information
Data Warehousing
A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile
collection of data for decision support applications. The construction of a data warehouse
with data cleaning and data integration is viewed as an important preprocessing step for
knowledge discovery tasks.
The proposal of the construction of a large data warehouse for multi-dimensional
analysis is from Codd, who coined the term OLAP for online analytical processing (Codd,
Codd & Salley, 1993). Portions of data warehouses were pre-computed and materialized
for effi cient processing, and such a materialized multidimensional database is called a data
cube (Gray et al., 1997). From the data structure point of view, a data cube is viewed as
a large multi-dimensional array which consists of a set of dimensions with respect to the
analyzed data, and a set of values in each cell called measures (Chaudhuri & Dayal, 1997).
From the operational point of view, a data cube is referred to as a relational operator, which
computes group-by aggregations over all possible subsets of the specifi ed dimensions (Gray
et al., 1997). It treats each of the n aggregated attributes as an n-dimensional sub-cube, or
cuboids. The aggregation of a particular set of attribute values is a point in this space. The
rapid acceptance of this operator has led to a variant of the CUBE being proposed for the
SQL standard.
View Maintenance
The view maintenance problem has been studied extensively (Mohania, Madria &
Kambayashi, 1999; Zhuge, Molina, Hammer & Widom, 1995; Griffi n & Libkin, 1995;
Roussopoulos, 1997; Yang, Karlapalem & Li, 1997) and the recent survey of view maintenance
literature can be found (Gupta & Mumick, 1995). Ross, Srivastava and Sudarshan (1996)
proposed an exhaustive enumerative algorithm for maintaining a view used for any relational
algebraic expression, and have shown that the maintenance cost of view is reduced by
maintaining a set of additional views along with the original view. Blakeley, Coburn and
Larson (1989) found out whether an update to a base relation can affect a derived relation
or not. They determined when a derived relation could be updated or not. Segev and Park
(1989) considered a problem of maintaining a collection of simple Select-Project views.
They developed a screen test procedure to fi lter out the tuples sent to remote sites. Fong
and Zeng (1997) presented a life cycle of developing a data warehouse as: planning,
data requirement analyzing and modeling, analytical database design, data mapping and
transformation, data extraction and load, automating data management procedures and data
validation and testing.
GENERAL ARCHITECTURE OF OLAM
In this section, we present the design and implementation of the online analytical
mining of path traversal patterns. It is a simple, scalable and effective method for analysis
of web usage. We integrate data mining techniques, and the Sequential FP-growth algorithm
with the following aspects: data warehouse, frame model metadata, view maintainability
and automated/incremental discovery-driven method for data exploration.
Search WWH ::




Custom Search