Databases Reference
In-Depth Information
singularity
Each Itemset must be represented once, and only once. This is a basic require-
ment of every ETL application. The initial extract application of an ETL
application must be able to recognize when it has, and has not, encountered
a set of data before. Data coming from an OLTP system can be rather tricky
that way. Fortunately, however, a data warehouse presents a much more sta-
ble and controlled data source. Regardless, extracting data from a data ware-
house is not time to relax on this, the most basic of ETL requirements. The
Market Basket ETL application that extracts data from the data warehouse
must be able to control the data such that each set of data and each row of
data are allowed to pass through to the Market Basket Table only once.
completeness
Each Itemset must be complete. This is another basic requirement of every
ETL application. A set of data must be complete such that a set of data is a
complete set of data, not a partial set of data. The goal of this requirement
is to meet the expectation of the analysts. The expectation of analysts is a
subtle, yet not quite so subtle, requirement. Analysts have the expectation
that the data in an Itemset is, unless otherwise posted, a full complement
of the data in that Itemset.
• Date—If a set of data represents all the Itemsets that happened on
a specific date, then analysts assume that date is fully represented
within that set of Itemsets. All rows of data associated with a date are
presented with that date. All rows of data not associated with a date
are not presented with that date. Therefore, if an analyst is looking
for an Itemset, and does not see that Itemset on a specific date, the
analyst does not see it on that date because it did not occur on that
date. An analyst can operate with this assumption because the data
is complete for that date.
• Objects—he set of objects in an Itemset is all the objects that
occurred in that Itemset. If an object is not listed within an Itemset,
it is not there because it did not occur within that Itemset. Therefore,
if an analyst is looking for an object, and does not see it within an
Itemset, the analyst does not see it there because it did not occur
within that Itemset. An analyst can operate with this assumption
because the data is complete for that Itemset.
Search WWH ::




Custom Search