Databases Reference
In-Depth Information
Metadata at this stage will include the control file (if provided), and the extract file name, size,
and source system identification. All of this data can be collected as a part of the audit process.
Master data at this stage has no role as it relates more to the content of the data extracts in the
processing stage.
Process stage . In this stage the data transformation and standardization including applying data-
quality rules is completed and the data is prepared for loading into the data Warehouse, datamart,
or analytical database. In this exercise both metadata and master data play very key roles:
Metadata is used in the data structures, rules, and data-quality processing.
Master data is used for processing and standardizing the key business entities.
Metadata is used to process audit data.
In this processing stage of data movement and management, metadata is very essential to ensure
auditability and traceability of data and the process.
Storage stage . In this stage the data is transformed to final storage at rest and is loaded to the data
structures. Metadata can be useful in creating agile processes to load and store data in a scalable
and flexible architecture. Metadata used in this stage includes loading process, data structures,
audit process, and exception processing.
Distribution stage . In this stage data is extracted or processed for use in downstream systems.
Metadata is very useful in determining the different extract programs, the interfaces between the data
warehouse or datamart, and the downstream applications and auditing data usage and user activity.
In a very efficiently designed system as described in Figure 11.4 we can create an extremely scal-
able and powerful data processing architecture based on metadata and master data. The challenge in
this situation is the processing complexity and how the architecture and design of the data manage-
ment platform can be compartmentalized to isolate the complexities to each stage within its own layer
of integration. Modern data architecture design will create the need for this approach to process and
manage the life cycle of data in any organization.
So far we have discussed the use of metadata and master data in creating an extremely agile and
scalable solution for processing data in the modern data warehouse. The next section will focus on
processing complexities with Big Data and how we can leverage the same concepts of metadata and
mater data, and will additionally discuss the use of taxonomies and semantic interfaces in managing
data processing within the Big Data ecosystem and the next-generation data warehouse.
Processing complexity of Big Data
The most complicated step in processing Big Data lies not just with the volume or velocity of the
data, but also its:
Variety of formats—data can be presented for processing as Excel spreadsheets, Word documents,
PDF files, OCR data, from emails, from content management platforms, from legacy applications,
and from web applications. Sometimes it may be variations of the same data over many time
periods where the metadata changed significantly.
Ambiguity of data—can arise from simple issues like naming conventions to similar column
names of different data types to same column storing of different data types. A lack of metadata
and taxonomies can create a significant delay in processing this data.
 
Search WWH ::




Custom Search