Databases Reference
In-Depth Information
7. Metadata maintenance process:
Explain how the maintenance of metadata is achieved.
The extent to which the maintenance of metadata is integrated in the warehouse development
life cycle and versioning of metadata.
Who maintains the metadata (e.g., Can users maintain it? Can users record comments or data-
quality observations?).
8. User access to metadata:
How will users interact and use the metadata?
Once the data is processed though the metadata stage, a second pass is normally required with the
master data set and semantic library to cleanse the data that was just processed along with its appli-
cable contexts and rules.
Standardize
Preparing and processing Big Data for integration with the data warehouse requires standardizing
of data, which will improve the quality of the data. Standardization of data requires the processing of
the data with master data components. In the processing of master data, if there are any keys found
in the data set, they are replaced with the master data definitions. For example, if you take the data
from a social media platform, the chances of finding keys or data attributes that can link to the master
data is rare, and will most likely work with geography and calendar data. But if you are processing
data that is owned by the enterprise such as contracts, customer data, or product data, the chances of
finding matches with the master data are extremely high and the data output from the standardization
process can be easily integrated into the data warehouse.
This process can be repeated multiple times for a given data set, as the business rule for each com-
ponent is different.
Distribute stage
Big Data is distributed to downstream systems by processing it within analytical applications and
reporting systems. Using the data processing outputs from the processing stage where the metadata,
master data, and metatags are available, the data is loaded into these systems for further processing.
Another distribution technique involves exporting the data as flat files for use in other applications
like web reporting and content management platforms.
The focus of this section was to provide readers with insights into how by using a data-driven
approach and incorporating master data and metadata, you can create a strong, scalable, and flexible
data processing architecture needed for processing and integration of Big Data and the data ware-
house. There are additional layers of hidden complexity that are addressed as each system is imple-
mented since the complexities differ widely between different systems and applications. In the next
section we will discuss the use of machine learning techniques to process Big Data.
Machine learning
From the prior discussions we see that processing Big Data in a data-driven architecture with seman-
tic libraries and metadata provides knowledge discovery and pattern-based processing techniques
where the user has the ability to reprocess the data multiple times using different patterns or, in other
 
Search WWH ::




Custom Search