Data Warehouse Concepts - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

fast, concurrent access to data. This requires transaction processing and

concurrency control capabilities, as well as recovery techniques that guarantee

data consistency. These systems are known as operational databases or

online transaction processing ( OLTP ) systems. The OLTP paradigm

is focused on transactions. In the Northwind database example, a simple

transaction could involve entering a new order, reserving the products

ordered, and, if the reorder point has been reached, issuing a purchase order

for the required products. Eventually, a user may want to know the status

of a given order. If a database is indexed following one of the techniques

described in the previous chapter, a typical OLTP query like the above would

require accessing only a few records of the database (and normally will return

a few tuples). Since OLTP systems must support heavy transaction loads,

their design should prevent update anomalies, and thus, OLTP databases

are highly normalized using the techniques studied in Chap. 2 .Thus,they

perform poorly when executing complex queries that need to join many

relational tables together or to aggregate large volumes of data. Besides,

typical operational databases contain detailed data and do not include

historical data.

The above needs called for a new paradigm specifically oriented to

analyze the data in organizational databases to support decision making.

This paradigm is called online analytical processing ( OLAP ). This

paradigm is focused on queries, in particular, analytical queries. OLAP-

oriented databases should support a heavy query load. Typical OLAP queries

over the Northwind database would ask, for example, for the total sales

amount by product and by customer or for the most ordered products by

customer. These kinds of queries involve aggregation, and thus, processing

them will require, most of the time, traversing all the records in a database

table. Indexing techniques aimed at OLTP are not ecient in this case: new

indexing and query optimization techniques are required for OLAP. It is easy

to see that normalization is not good for these queries, since it partitions the

database into many tables. Reconstructing the data would require a high

number of joins.

Therefore, the need for a different database model to support OLAP

was clear and led to the notion of data warehouses , which are (usually)

large repositories that consolidate data from different sources (internal and

external to the organization), are updated off-line (although as we will see,

this is not always the case in modern data warehouse systems), and follow

the multidimensional data model. Being dedicated analysis databases, data

warehouses can be designed and optimized to eciently support OLAP

queries. In addition, data warehouses are also used to support other kinds of

analysis tasks, like reporting, data mining, and statistical analysis.

Data warehouses and OLAP systems are based on the multidimensional

model ,whichviewsdatainan n -dimensional space, usually called a data

cube or a hypercube . A data cube is defined by dimensions and facts.

Dimensions are perspectives used to analyze the data. For example, consider

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home