Database Reference
In-Depth Information
Data Quality
Data quality components can be used to ensure the cleanliness and accuracy
of information. IBM InfoSphere Information Server for Data Quality (IIS for
DQ) is a market-leading data quality product. It contains innovative features
such as information profiling and quality analysis, address standardization
and validation, and it's fully integrated into the InfoSphere Information
Server platform for quality rule development, execution of quality jobs on
Information Server's parallel processing platform, and sharing metadata
with enterprise metadata component. Data quality discussions typically in-
volve the following services:
Parsing Separating data and parsing it into a structured format.
Standardization Determining what data to place in which field and
ensuring that it's stored in a standard format (for example, a nine-digit
zip code).
Validation Ensuring that data is consistent; for example, a phone
number contains an area code and the correct number of digits for its
locale. It might also include cross-field validation, such as checking the
telephone area code against a city to ensure that it's valid (for example,
area code 416 is valid for Toronto, 415 is not).
Verification Checking data against a source of verified information
to ensure that the data is valid; for example, checking that an address
value is indeed a real and valid address.
Matching
Identifying duplicate records and merging those records
correctly.
Organizations should determine whether their Big Data sources require
quality checking before analysis, and then apply the appropriate data quality
components . A Big Data project is likely going to require you to focus on data
quality when loading a data warehouse to ensure accuracy and complete-
ness; when loading and analyzing new sources of Big Data that will be inte-
grated with a data warehouse; and when Big Data analysis depends on a
more accurate view (for example, reflecting customer insight), even if the
data is managed within Hadoop.
 
Search WWH ::




Custom Search