Database Reference
In-Depth Information
Figure 6-2.
Comparing the Titles table to DimTitles
All these differences make up a list of transformations that must be addressed as the data moves from the
Titles table to the DimTitles table.
■
The following pages have a lot of sQL programming code. if you are not a sQL programmer, many
examples will seem obscure and perhaps even difficult to read. we have endeavored to keep the examples simple to
alleviate confusion; however, the ETL process is complex and most examples can be simplified only so much. Conse-
quently, consider our listings as general examples of how a programmer could create an ETL process. not every ETL
process will be coded the same way, and they may not be this simplistic. if you happen to find these samples too
difficult, keep in mind that you may never be asked to create the sQL code on your own. nevertheless, you may be
expected to understand what some of this sQL code does. Therefore, we recommend that you focus on the explana-
tion of each process instead of the details on how the code is written.
Note
Programming Your Transformation Logic
You need to create code and programming structures to transform any data that requires it. This transformation
code can use SQL or application code, such as C#, or a combination of both.
This code is generated for you using tools such as SSIS. However, automatically generated code is not
efficient code, so you may have to optimize it yourself. For that matter, sometimes you even need to fix it before
you can use it in production.
The more you work with ETL processing, the more you will find that having a thorough understanding of the
code that performs the transformations will help you effectively create SSIS packages. Using tools that help you
create code and knowing how to optimize that code are two aspects of ETL processing that go hand in hand. Let's
take a look at some common programming techniques that are simple to implement but still provide a great deal
of benefit for your efforts.
Reducing the Data
It is unlikely that you will need to extract every column from the original table; therefore, you can simply leave out
the columns you do not want from the select clause. This simple procedure represents the first task in optimizing
your ETL process.