Database Reference
In-Depth Information
ensure one version of the truth of data, the inte-
grated data model at the enterprise data warehouse
must hold the most detailed enterprise data in a
generic and versatile way so that correction and
re-calculation of data will use this layer as the
ground. Second, the metadata repository must in-
clude a clear metadata catalogue and the metadata
can be extracted from different sources in time.
Third, in the data staging area, the ETL process
and data cleansing as well as standardization
operations must be pre-defined according to data
quality requirements so that these operations can
be applied to the data from operational systems
and only qualified data can be loaded into the
enterprise data warehouse. Fourth, the database
platform as well as the ETL tool must be tuned in
order to meet the performance requirement. The
direct link from the enterprise data warehouse to
data access tools also provides a “short-cut” when
data marts are unavailable or unnecessary to be
placed between these two layers. Fifth, the data
marts must be designed in an “easy to understand”
format in order for users to access and use. The
star-schema design is a good practice of “easy to
understand” format.
part of any data warehouses. Although temporal
database theory has been well investigated in the
research community (Jensen & Snodgrass, 1999),
most industry practices are focused on separating
history data from the current status of entities,
identifying entities that are only linked to either
a time period or a time point, and finding out how
the changes to each record should be applied. In the
dimensional modeling theory and the discussion of
slowly changing dimensions, the type 1 and type
2 updates are mostly used in the industry.
Metadata is often defined as “data about data.”
Data warehouses in enterprises are often in an
environment where different tools are involved
in the architecture. Management of these different
metadata is often split into different tools due to
the cost of integration. The problem of metadata
integration and interpretability has recently been
attracting attentions at the research community
(Bernstein, 2005; Hauch, Miller, & Cardwell,
2005; Friedrich, 2005). Data lineage is a prac-
tice that quite a few metadata and ETL tools are
establishing at different data warehouses (Cui &
Widom, 2000). Data lineage records where data is
and how it flows to, so that it is easy to securely
manage the lifecycle of data when it moves across
the whole data warehouse architecture. As indi-
cated in Marco (2004), the implementation of
a metadata repository is in fact very similar to
building data warehouses.
Most early adaptors of data warehouses have
the experience of suffering from bad quality of
data. Normally, data is considered of high quality
if they correctly represent the real world construct
to which they refer to. The dominant industry
practice of data quality is to apply quality-checking
and controlling operations in the ETL processes,
such as data cleansing and data standardization.
These operations are normally based on data in-
tegrity rules and business logics discovered in the
data modeling process. Data quality has already
been a focus in the research community (Ballou
& Tayi, 1999). Data quality at data warehouses
often depends on the quality of delivered data
Architecture Practices
and related Work
Based on the prototype architecture depicted in
Figure 4, we proceed to introduce different in-
dustry practices in data warehouse architecture
and describe academic research works that are
related to these practices.
The data model is an essential part of data
warehouses. The concept of dimensional modeling
has been spread to almost every data warehouse
system. However, many industry practices have
proved that the dimensional modeling techniques
are not able to hold different data of a whole enter-
prise. Instead, a few data warehouse model ven-
dors, such as IBM and Teradata, provide industry
standard models based on relational modeling and
normal form theory. Historical data is a compulsory
Search WWH ::




Custom Search