Database Reference
In-Depth Information
Implement
physical schema
Implement ETL
processes
Fig. 10.12 Steps for physical design
consider both the proposed logical schema and the analytical queries specified
during the process of requirements gathering. A well-developed physical
design should enable to manage very large amounts of data, to refresh the
data warehouse with new data from the source systems, to perform complex
operations that may include joins of many tables, and to aggregate many data
items. All of these depend on the facilities provided by the DBMS regarding
storage methods, indexes, partitioning, parallel query execution, aggregation
functions, and view materialization, among other things.
As studied in Chap. 7 , partitioning consists in dividing a table into
smaller data sets, thus providing better support for the management of
very large volumes of data. For example, if it is usual that a query requests
employee names, dimension Employee can be fragmented vertically to have the
attributes FirstName , LastName ,and City in one partition and the remaining
ones in another partition. This enables more records to be retrieved into the
main memory. Also, the Sales fact table could be partitioned horizontally
according to time if it is usual that queries require the most recent data
according to some time frame, for example, years. However, to be able to
use partitioning techniques, we must have a good knowledge not only of the
consequences of having partitioned dimension and fact tables but also of
which method of partitioning may work better if the system we are working
with supports more than one (e.g., Oracle provides four types of horizontal
partitioning methods). SQL Server, as we have seen in Chap. 7 , also supports
partitioning. We have shown how the Sales fact table can be partitioned by
year. We have also explained in Chap. 7 that in Analysis Services it is possible
to partition data cubes, where the storage modes in the partitions can differ
from each other. For example, one partition can be stored in a MOLAP mode,
while another one can be stored in a ROLAP mode.
Indexing is a key feature to obtain good query performance in data
warehouses. In the physical design phase, we must define which kinds of
indexes we are going to use and over which attributes. In addition to the
typical B-tree and hashed indexes provided by the database management
systems, bitmap indexes and join indexes are used in data warehousing. We
have studied that bitmap indexes are appropriate for answering typical data
warehouse queries, in particular for filtering tuples using conditions over low-
cardinality columns. Again, the designer should be aware of the possibilities
of the DBMS that she will use. For example, we have seen in Sect. 7.8.3 that
in SQL Server it is not possible to define a bitmap index over a table column,
while this is possible in Oracle. We have also seen that column-store indexes,
which were studied in Chap. 7 , can be defined in SQL Server.
Search WWH ::




Custom Search