Database Reference
In-Depth Information
there is a column in the CleanTemperature table that contains the Average Dew Point
for a given hour. Knowing your data means that you understand how and when it
changes—or even if it changes. Certain types of data, like weather data, do not get up-
dated after they are captured and recorded.
If you have read Tim Mitchell's description of a typical incremental load pattern in
Chapter 11 , you will note a Lookup transform configured to connect to the destination
and read data. In MASD, you will pay for that data read. There are almost 30,000 rows
present in my MASD AndyWeather database. If I load all the rows from my source into
a Data Flow task and use a Lookup transform to “join” between my source and MASD,
I pay for reading rows that haven't changed. What's more, I know they will never
change . Since weather data is never updated once recorded, there will simply be no
Changed Rows.
Each hour, a few rows of new data are generated for each subject area in the
AndyWeather database. If I use a Lookup, I load all 30,000 rows for no good reas-
on—and I pay for the privilege. No thank you.
To limit the amount of data read, and thereby lower the total cost of the solution, I
could execute a query that selects a “marker” indicating the latest or last row loaded.
For this, I could select the maximum value from a field containing the date and time
the table was last loaded; something like Max(LastUpdatedDateTime) or even a
data integration lineage or metadata field like Max(LoadDate) . I could similarly se-
lect another marker such as the Max(ID) from an integer column reliably maintained
by a sequence, identity, trigger, or other mechanism. The reliability of the mechanism
represents the maximum confidence a data integration architect can place in the value.
I will demonstrate building an incremental loader using an identity column maintained
on the source data.
Before I do, I wish to point out that Chapter 11 contains a great section on the in-
cremental load design pattern. There is a discussion of another solution I will not touch
upon: change data capture. I encourage you to review Chapter 11 before you complete
your data integration design.
Building the Cloud Loader
To work through the demonsration that follows, you will need a MASD account and
database. Creating the account and database is beyond the scope of this topic, but you
can learn more at www.windowsazure.com/en-us/home/features/sql-
Search WWH ::




Custom Search