Databases Reference
In-Depth Information
relationship information between data entities can be effectively leveraged by big data
platforms while analyzing data from external data sources along with data from within
the enterprise.
The MDM logical integration architecture is designed to support the multiple
MDM methods of use across multiple master data domains, to maintain cross-domain
relationships, and to provide the required functionality to have a collaborative environment
taking into account the hybrid architectures of relational and non-relational data
platforms. The architecture is structured to be scalable, highly available, and extensible,
and provides the flexibility to integrate technology from a variety of vendors and integrate
with future unknown systems.
Data Quality Implications for Big Data
There is a lot of literature about what is now possible given the opportunity of big data
and what organizations should be doing. But very little has been discussed in terms of
guidance and recommendations related to data quality and big data.
Data management and data quality principles for big data are the same as they have
been in the past for traditional data. But priorities may change, and certain data management
and data quality processes such as metadata, data integration, data standardization, and data
quality must be given increased emphasis. One major exception involves the time-tested
practice of clearly defining the problem. In the world of big data, where data may be used in
ways not originally intended, data elements need to be defined, organized, and created in a
way that maximizes potential use and does not hinder future utility.
Your data quality approach for big data should be designed with several factors
in mind: it doesn't make sense to apply one data quality approach for all types of data.
You should consider where the data came from, how the data will be used, what are the
workload types, who will use the data, and perhaps most importantly, what decisions will
be made with the data.
What data do you trust? Increasingly, business stakeholders and data scientists
are beginning to draw conclusions based on big data sources. Yet, the fact is, these data
are mined and analyzed in a way that doesn't adhere to the existing data governance
processes. There is a valid argument for doing it this way. If you need speed of insight and
support data discovery over repeatable reporting, then you can't constrain the activities.
Traditional approaches to data quality heavily revolve around the notion of
persistence of cleansed data. For years data quality efforts have focused on finding and
correcting bad data. We use the word cleansing to represent the removal of what we don't
want. Knowing what your data is, what it should look like, and how to transform it into
submission defined the data quality handbook. Whole practices were created to track
data quality issues, establish workflows and teams to clean the data, and then reports
were produced to show what was done. These practices were measured against metrics
such as identification of the number of duplicates, completeness of records, accuracy of
records, currency of records, and conformance to standards, to name a few. However,
when it comes to big data, how do we cleanse it?
The answer to the above question is, maybe you don't. The nature of big data
doesn't allow itself to traditional data quality practices. The volume may be too large
for processing. The volatility and velocity of data makes it difficult to keep track of.
The variety of data, both in scale and visibility, is ambiguous.
 
Search WWH ::




Custom Search