A Method for Data Warehouse Design - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

Implement

physical schema

Implement ETL

processes

Fig. 10.12 Steps for physical design

consider both the proposed logical schema and the analytical queries specified

during the process of requirements gathering. A well-developed physical

design should enable to manage very large amounts of data, to refresh the

data warehouse with new data from the source systems, to perform complex

operations that may include joins of many tables, and to aggregate many data

items. All of these depend on the facilities provided by the DBMS regarding

storage methods, indexes, partitioning, parallel query execution, aggregation

functions, and view materialization, among other things.

As studied in Chap. 7 , partitioning consists in dividing a table into

smaller data sets, thus providing better support for the management of

very large volumes of data. For example, if it is usual that a query requests

employee names, dimension Employee can be fragmented vertically to have the

attributes FirstName , LastName ,and City in one partition and the remaining

ones in another partition. This enables more records to be retrieved into the

main memory. Also, the Sales fact table could be partitioned horizontally

according to time if it is usual that queries require the most recent data

according to some time frame, for example, years. However, to be able to

use partitioning techniques, we must have a good knowledge not only of the

consequences of having partitioned dimension and fact tables but also of

which method of partitioning may work better if the system we are working

with supports more than one (e.g., Oracle provides four types of horizontal

partitioning methods). SQL Server, as we have seen in Chap. 7 , also supports

partitioning. We have shown how the Sales fact table can be partitioned by

year. We have also explained in Chap. 7 that in Analysis Services it is possible

to partition data cubes, where the storage modes in the partitions can differ

from each other. For example, one partition can be stored in a MOLAP mode,

while another one can be stored in a ROLAP mode.

Indexing is a key feature to obtain good query performance in data

warehouses. In the physical design phase, we must define which kinds of

indexes we are going to use and over which attributes. In addition to the

typical B-tree and hashed indexes provided by the database management

systems, bitmap indexes and join indexes are used in data warehousing. We

have studied that bitmap indexes are appropriate for answering typical data

warehouse queries, in particular for filtering tuples using conditions over low-

cardinality columns. Again, the designer should be aware of the possibilities

of the DBMS that she will use. For example, we have seen in Sect. 7.8.3 that

in SQL Server it is not possible to define a bitmap index over a table column,

while this is possible in Oracle. We have also seen that column-store indexes,

which were studied in Chap. 7 , can be defined in SQL Server.

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home