Data-Driven Architecture for Big Data - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

7. Metadata maintenance process:

● Explain how the maintenance of metadata is achieved.

● The extent to which the maintenance of metadata is integrated in the warehouse development

life cycle and versioning of metadata.

● Who maintains the metadata (e.g., Can users maintain it? Can users record comments or data-

quality observations?).

8. User access to metadata:

●

How will users interact and use the metadata?

Once the data is processed though the metadata stage, a second pass is normally required with the

master data set and semantic library to cleanse the data that was just processed along with its appli-

cable contexts and rules.

Standardize

Preparing and processing Big Data for integration with the data warehouse requires standardizing

of data, which will improve the quality of the data. Standardization of data requires the processing of

the data with master data components. In the processing of master data, if there are any keys found

in the data set, they are replaced with the master data definitions. For example, if you take the data

from a social media platform, the chances of finding keys or data attributes that can link to the master

data is rare, and will most likely work with geography and calendar data. But if you are processing

data that is owned by the enterprise such as contracts, customer data, or product data, the chances of

finding matches with the master data are extremely high and the data output from the standardization

process can be easily integrated into the data warehouse.

This process can be repeated multiple times for a given data set, as the business rule for each com-

ponent is different.

Distribute stage

Big Data is distributed to downstream systems by processing it within analytical applications and

reporting systems. Using the data processing outputs from the processing stage where the metadata,

master data, and metatags are available, the data is loaded into these systems for further processing.

Another distribution technique involves exporting the data as flat files for use in other applications

like web reporting and content management platforms.

The focus of this section was to provide readers with insights into how by using a data-driven

approach and incorporating master data and metadata, you can create a strong, scalable, and flexible

data processing architecture needed for processing and integration of Big Data and the data ware-

house. There are additional layers of hidden complexity that are addressed as each system is imple-

mented since the complexities differ widely between different systems and applications. In the next

section we will discuss the use of machine learning techniques to process Big Data.

Machine learning

From the prior discussions we see that processing Big Data in a data-driven architecture with seman-

tic libraries and metadata provides knowledge discovery and pattern-based processing techniques

where the user has the ability to reprocess the data multiple times using different patterns or, in other

Search WWH ::

Custom Search

Home