Databases Reference
In-Depth Information
Now, there are several new technologies and architectures
enabling companies with cost effective solutions. We will discuss
the SMAQ stack later in this chapter and how it solves the
big-data-related issues while at the same time providing a cost
effective viable alternative to IT infrastructure.
we are not advising that you sunset all your enterprise IT platforms and adopt the
SmAQ stack; but there needs to be a pragmatic approach in developing a big data ecosystem
where enterprise platforms and SmAQ systems can co-exist to deliver cost effective solutions
for the enterprise. we will discuss these approaches at length in chapters 4, 5, and 6.
Note
Data Quality: There is a debate as to whether data quality principles should be
applied to big data scenarios or not. Data quality does have some role to play in big data,
as it ensures that the data is well formed, accurate, and can be trusted. Approaching data
quality for big data following the traditional route of data profiling (i.e., data cleansing)
data monitoring will be extremely difficult; there is too much data to profile, and often
you are not so sure about the structure of the data. Moreover, the long time frames for
data quality lifecycle (i.e., the approach to remediate data quality issues and deliver
“clean” data) does not lend itself too much to agility, which is a key requirement for big
data analytics. Data quality issues are more pronounced with transactional data as they
are primarily produced due to inadequate checks and controls at the source systems and
not so much due to the volume of data.
Due to these considerations, it is recommended that ongoing data
quality initiatives be focused on resolving data quality issues for
transactional and reference/master data either closer to the source
and/or downstream. For the big data scenarios, there is tremendous
value in applying data quality rules to the big data sets and getting an
idea of the conformance of such data sets to the applied rules.
MDM: MDM has the inherent goal of reconciling data silos across such categories as
customers, products, assets, etc., to produce a consistent, trusted source of critical core
business data entities. However, the volume and variety of data in the big data scenarios
pose serious challenges to implementing a MDM system for your enterprise.
The biggest advantage of big data sources (external to the
corporate firewalls) is that they help in validating your master
entities and in many cases help in enriching them. For example,
using Google e-mail ID, Facebook IDs and LinkedIn IDs you can
further enrich you customer identification process and improve
your conversations with customers through multiple channels.
Metadata Management: Metadata management aims to provide consistent
representation and understanding of data definitions across the enterprise. However, due
to sheer variety and diversity of data types in big data sets, scaling metadata management
to cover big data scenarios becomes very difficult and not economical.
 
 
Search WWH ::




Custom Search