Data-Driven Architecture for Big Data - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

●

Metadata at this stage will include the control file (if provided), and the extract file name, size,

and source system identification. All of this data can be collected as a part of the audit process.

●

Master data at this stage has no role as it relates more to the content of the data extracts in the

processing stage.

●

Process stage . In this stage the data transformation and standardization including applying data-

quality rules is completed and the data is prepared for loading into the data Warehouse, datamart,

or analytical database. In this exercise both metadata and master data play very key roles:

● Metadata is used in the data structures, rules, and data-quality processing.

● Master data is used for processing and standardizing the key business entities.

● Metadata is used to process audit data.

In this processing stage of data movement and management, metadata is very essential to ensure

auditability and traceability of data and the process.

●

Storage stage . In this stage the data is transformed to final storage at rest and is loaded to the data

structures. Metadata can be useful in creating agile processes to load and store data in a scalable

and flexible architecture. Metadata used in this stage includes loading process, data structures,

audit process, and exception processing.

●

Distribution stage . In this stage data is extracted or processed for use in downstream systems.

Metadata is very useful in determining the different extract programs, the interfaces between the data

warehouse or datamart, and the downstream applications and auditing data usage and user activity.

In a very efficiently designed system as described in Figure 11.4 we can create an extremely scal-

able and powerful data processing architecture based on metadata and master data. The challenge in

this situation is the processing complexity and how the architecture and design of the data manage-

ment platform can be compartmentalized to isolate the complexities to each stage within its own layer

of integration. Modern data architecture design will create the need for this approach to process and

manage the life cycle of data in any organization.

So far we have discussed the use of metadata and master data in creating an extremely agile and

scalable solution for processing data in the modern data warehouse. The next section will focus on

processing complexities with Big Data and how we can leverage the same concepts of metadata and

mater data, and will additionally discuss the use of taxonomies and semantic interfaces in managing

data processing within the Big Data ecosystem and the next-generation data warehouse.

Processing complexity of Big Data

The most complicated step in processing Big Data lies not just with the volume or velocity of the

data, but also its:

●

Variety of formats—data can be presented for processing as Excel spreadsheets, Word documents,

PDF files, OCR data, from emails, from content management platforms, from legacy applications,

and from web applications. Sometimes it may be variations of the same data over many time

periods where the metadata changed significantly.

●

Ambiguity of data—can arise from simple issues like naming conventions to similar column

names of different data types to same column storing of different data types. A lack of metadata

and taxonomies can create a significant delay in processing this data.

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home