Databases Reference
In-Depth Information
effect on the existing data, and needs a significant effort in ETL and reporting layers to refresh
the changes. Another partition technique already used is horizontal partitioning where the table is
partitioned by date or numeric ranges into smaller slices.
Colocation —a table and all its associated tables can be colocated in the same storage region. This
is a simple exercise but provides powerful performance benefits.
Distribution —a large table can be broken into a distributed set of smaller tables and used. The
downside is when a user asks for all the data from the table, we have to join all the underlying
tables.
New data types —several new data types like geospatial and temporal data can be used in the data
architecture and current workarounds for such data can be retired. This will provide a significant
performance boost.
New database functions —several new databases provid native functions like scalar tables and
indexed views, and can be utilized to create performance boosts.
Though there are several possibilities, data engineering can be done only if all other possibilities
have been exhausted. The reason for this is there is significant work that needs to be done in the ETL
and reporting layers if the data layer has changes. This requires more time and increases risk and
cost. Therefore, data engineering is not often a preferred technique when considering reengineering
or modernizing the data warehouse.
Modernizing the data warehouse
When you consider modernizing the data warehouse, there are several questions that need to be
addressed.
Migration options—a key question that needs to be answered for deciding the modernization
strategy and approach is the selection of one of these choices:
Do we rip/replace the existing architecture? Depending on the choice of whether you are
moving to a self-contained platform, such as the data warehouse appliance, or migrating the
entire platform, this choice and its associated impact needs to be answered.
Do we augment the existing architecture? If you choose platform reengineering as the
approach, this is a preferred approach to complete the migration process.
Migration strategy—the following questions need to be articulated to formulate the strategy to
build the roadmap for the reengineering or modernizing process:
What is the biggest problem that you are facing? A clear articulation on the nature of the
problem and the symptoms of where the problem manifests needs to be documented. The
documentation needs to include examples of the problem, including the SQL, formulas, and
metrics.
How mature is your data warehouse? A documentation of the data warehouse processes and
the associated business transformation rules needs to be analyzed for determining the maturity
of the data warehouse. This will help in planning the migration process.
How complex is your ETL process? A documented analysis of the ETL process is needed to
complete the ETL migration and determine the associated complexity. If the ETL process is
the major reason for the reengineering, you need to determine if just changing the ETL process
and its infrastructure will provide the performance and scalability.
 
Search WWH ::




Custom Search