Database Reference
In-Depth Information
Now, when you create your database, you modify the SQL code to reflect your new design. Listing 18-2
shows an example of this.
Listing 18-2. Creating a Database on a Specific Drive with Modified Code
USE [master]
GO
-- Note: The OS is on C:\ drive, so we put the database file on D:\drive
CREATE DATABASE [DWPubsSales] ON PRIMARY
( NAME = 'DWPubsSales'
, FILENAME = ' D:\ MySQLDataFiles\DWPubsSales.mdf'
)
LOG ON -- Note: we put the log file on a separate drive as well!
( NAME = 'DWPubsSales_log'
, FILENAME = ' E:\ MySQLLogFile\DWPubsSales_log.LDF'
)
GO
This common design strategy represents a good performance boost at a reasonable cost, because hard drives
are relatively inexpensive. But keep in mind that you will receive no performance benefits by placing the log files
on a separate array drive if you are not performing large amounts of transactional statements, such as INSERTS ,
UPDATES , and DELETES . Therefore, placing the log file for the data warehouse database on a separate RAID array
helps only during ETL INSERT , UPDATE , and DELETE operations. It does not help during cube- and dimension-
processing operations, because that involves selecting data from the data warehouse and not transactional
statements.
Your data warehouse database can gain benefits from a RAID array; however, some developers may argue
that fault tolerance is not strictly necessary, since you can always rebuild the data warehouse from the source
data after a hard drive crash. This argument sounds good on the surface and may be appropriate in certain
circumstances, but hard drives are inexpensive, and most computers natively support RAID technology, so we
recommend investing in fault tolerance on the data warehouse anyway. This is especially true if the time it takes
to rebuild the data warehouse is prohibitive to your business.
In reality, you may or may not see a boost in performance by adding additional hard drives, depending on
the amount of data involved in the ETL operations. But adding additional hard drives into a RAID array is a good
idea, regardless of the performance gains, because of its fault tolerance aspect.
ETL Software Options
SSIS can also be tuned to perform better with some “interesting” configuration tricks. However, you will probably
find it simpler and see greater performance gains if you remember to perform most SQL transformations within
the database engine instead of trying to perform them externally in SSIS.
As we discussed in Chapter 6, most transformations can be done within SQL Server by creating SELECT
statements that utilize various built-in functions. Using SQL to do your transformations is a proven way to
increase ETL performance. Because of this, we recommend you focus on making sure that the source database
has all the resources it needs before trying to improve SSIS performance using arcane techniques.
The other half of ETL processing includes loading the data into a table at the destination. This involves using
SQL INSERT , UPDATE , and occasionally DELETE statements. In these cases, indexes can impair the loading process,
because the order of the data is maintained as the statements are being processed.
You may find that dropping the indexes in the data warehouse before it is loaded will reduce the time it takes
to load. However, while dropping indexes on the data warehouse may improve loading its tables, it will decrease
the performance of the SELECT statements that run when a cube or dimension is processed. Therefore, after the
data warehouse has been loaded with data, the dropped indexes need to be replaced.
 
Search WWH ::




Custom Search