Testing and Tuning BI Solutions - Pro SQL Server 2012 BI Solutions

Database Reference

In-Depth Information

Now, when you create your database, you modify the SQL code to reflect your new design. Listing 18-2

shows an example of this.

Listing 18-2. Creating a Database on a Specific Drive with Modified Code

USE [master]

GO

-- Note: The OS is on C:\ drive, so we put the database file on D:\drive

CREATE DATABASE [DWPubsSales] ON PRIMARY

( NAME = 'DWPubsSales'

, FILENAME = ' D:\ MySQLDataFiles\DWPubsSales.mdf'

)

LOG ON -- Note: we put the log file on a separate drive as well!

( NAME = 'DWPubsSales_log'

, FILENAME = ' E:\ MySQLLogFile\DWPubsSales_log.LDF'

)

GO

This common design strategy represents a good performance boost at a reasonable cost, because hard drives

are relatively inexpensive. But keep in mind that you will receive no performance benefits by placing the log files

on a separate array drive if you are not performing large amounts of transactional statements, such as INSERTS ,

UPDATES , and DELETES . Therefore, placing the log file for the data warehouse database on a separate RAID array

helps only during ETL INSERT , UPDATE , and DELETE operations. It does not help during cube- and dimension-

processing operations, because that involves selecting data from the data warehouse and not transactional

statements.

Your data warehouse database can gain benefits from a RAID array; however, some developers may argue

that fault tolerance is not strictly necessary, since you can always rebuild the data warehouse from the source

data after a hard drive crash. This argument sounds good on the surface and may be appropriate in certain

circumstances, but hard drives are inexpensive, and most computers natively support RAID technology, so we

recommend investing in fault tolerance on the data warehouse anyway. This is especially true if the time it takes

to rebuild the data warehouse is prohibitive to your business.

In reality, you may or may not see a boost in performance by adding additional hard drives, depending on

the amount of data involved in the ETL operations. But adding additional hard drives into a RAID array is a good

idea, regardless of the performance gains, because of its fault tolerance aspect.

ETL Software Options

SSIS can also be tuned to perform better with some “interesting” configuration tricks. However, you will probably

find it simpler and see greater performance gains if you remember to perform most SQL transformations within

the database engine instead of trying to perform them externally in SSIS.

As we discussed in Chapter 6, most transformations can be done within SQL Server by creating SELECT

statements that utilize various built-in functions. Using SQL to do your transformations is a proven way to

increase ETL performance. Because of this, we recommend you focus on making sure that the source database

has all the resources it needs before trying to improve SSIS performance using arcane techniques.

The other half of ETL processing includes loading the data into a table at the destination. This involves using

SQL INSERT , UPDATE , and occasionally DELETE statements. In these cases, indexes can impair the loading process,

because the order of the data is maintained as the statements are being processed.

You may find that dropping the indexes in the data warehouse before it is loaded will reduce the time it takes

to load. However, while dropping indexes on the data warehouse may improve loading its tables, it will decrease

the performance of the SELECT statements that run when a cube or dimension is processed. Therefore, after the

data warehouse has been loaded with data, the dropped indexes need to be replaced.

Pro SQL Server 2012 BI Solutions

Search WWH ::

Custom Search

Home