Databases Reference
In-Depth Information
Often, it is desirable to divide the data evenly between partitions, facilitating the
balancing of loads over multiple storage devices. Partitioning by hash values may be a
good option to satisfy this purpose. Here is a materialized view definition that divides
the labor cost by repair date rows into three partitions based on hashing.
CREATE MATERIALIZED VIEW mv_labor_cost_by_repair_date
PARTITION BY HASH(repair_date_id)
PARTITIONS 3 STORE IN (tablespace1, tablespace2,
tablespace3)
AS
SELECT repair_date_id, sum(labor_cost)
FROM warranty_claim
GROUP BY repair_date_id;
Partition by hash may not work well in the case where the distribution of values is
highly skewed. For example, if 90% of the rows have a given value, then at least 90% of
the rows will map to the same partition, no matter how many partitions we use, and no
matter what hash function the system utilizes.
14.7.2 Microsoft's Analysis Services
Microsoft SQL Server 2005 Analysis Services currently supports OLAP and data min-
ing operations. The Analysis Manager is used to specify a data source. Many options are
supported for the data source, including Open DataBase Connectivity (ODBC) data
sources. A database connection can be established to a Microsoft SQL Server database
(or any other ODBC-compliant database). The dimension tables and fact tables are
specified using GUI screens, and the data cube is then built. There are a series of options
available including ROLAP, HOLAP, and MOLAP. There are also several options for
specifying limits on the use of aggregates. The user can specify a space limit.
Figure 14.10 shows a screen from the Storage Design Wizard. The wizard selects
views to materialize, while displaying the progress in graph form. Note that Microsoft
uses the term “aggregations” instead of materialized views in this context. OLAP sys-
tems improve performance by precalculating views and materializing the results to disk.
Queries are answered from the smaller aggregations instead of reading the large fact
table. Typically, there are far too many possible views to materialize them all, so the
OLAP system needs to pick strategic views for materialization. In Microsoft Analysis
Services, you have several options to control the process. You may specify the maximum
amount of disk space to use for the aggregates. You also have the option of specifying
the performance gain. The higher the performance gain, the more disk space is required.
The Microsoft documentation recommends a setting of about 30% for the performance
gain. Selecting a reasonable performance gain setting is problematic, since the gain is
Search WWH ::




Custom Search