Databases Reference
In-Depth Information
Data architecture
From the perspective of the data warehouse, the data architecture includes the key steps of acquiring,
modeling, cleansing, preprocessing, processing, and integrating data from the source to the EDW out-
lined here:
1. Business requirements analysis:
In this step the key business requirements are gathered from business users and sponsors.
The requirements will outline the needs for data from an analysis perspective.
The requirements will outline the needs for data availability, accessibility, and security.
2. Data analysis:
In this step the data from the OLTP is analyzed for data types, business rules, quality, and
granularity.
Any special requirements for data are discovered and documented in this step.
3. Data modeling:
In this step the data from the OLTP models are converted to a relational model. The modeling
approach can be 3NF, star schema, or snowflake.
The key subject areas and their relationships are designed.
The hierarchies are defined.
The physical database design is foundationally done in this step.
The staging schema is created.
The EDW schema is created.
If an ODS is defined and modeled, the ODS data model is defined and created.
4. Data movement:
In this step the process of extracting, loading, and transformation of data is designed,
developed, and implemented.
There are three distinct processes developed typically; some designs may do less and others
will do more, but the fundamental steps are:
- Source extract
- Staging loading
- Staging extract and EDW loading
Data transformations are applied in this phase to transform from the OLTP to the EDW model
of data.
Any errors occurring in this stage of processing are captured and processed later.
Data movement and processing will need to be audited for accuracy and a count of data along
each step of the process. This is typically accomplished by implementing an audit, balance,
and control process to trace data from the point of arrival into the data warehouse to its final
delivery to datamarts and analytical data stores.
5. Data quality:
In this step, typically done both in the ETL steps and in the staging database, the data from the
source databases is scrubbed to remove any data-quality issues that will eventually lead to data
corruption or referential integrity issues.
To enable data quality, there are special third-party tools that can be deployed.
Errors in this step are critical to be discovered and marked for reprocessing.
Search WWH ::




Custom Search