Databases Reference
In-Depth Information
DATA PROFILING: INVESTIGATION AND ENFORCEMENT
Data profiling is a data investigation and quality monitoring process.
It allows the business to assess the quality of its data through metrics, to
discover or infer rules based on data, and to monitor historical metrics
about data quality, such as range of values, frequency, patterns/formats,
and sparseness. Data profiling is a key enforcement mechanism of data
governance. Data profiling examines the data to validate the data. Often
this process leads to the discovery of new business rules.
The following are types of data profiling:
• Integration Data Profiling: Integration means conforming data to a
single enterprise value for the data element. For example, the state
code for Colorado might be 23, C, or CO in three different source
systems. There needs to be agreement on one enterprise value, such
as CO for Colorado. We would then convert the 23 and C to a CO.
This would allow the end user to run integrated queries in which CO
will represent all of the Colorado data. We need to have data profiling
process to verify that the integration rules are being implemented.
• Domain Validation Data Profiling: Domain validation data profiling
process would confirm the valid possible values for a column. For
example, the gender column can have only M, F, or null. If we see an
X value, that would violate the domain validation rule.
• Format Validation Data Profiling: Format validation data profiling
process would check to see if we are enforcing a specific standard
format requirement, such as phone number or a Social Security
Number (SSN). For example, SSN has the format 999-99-9999. The
9 value represents a number value. So 923-123-333X would be an
invalid SSN number.
• Range Checking Data Profiling: Range checking data profiling
checks the data to verify that data values fit within a boundary of
data values. For example, a birth date value may be checked to verify
that the person is no more than 200 years old. It is a common mis-
take for people to leave of the first two digits of the birth year. For
example, a person who was born in 1959 might enter 59 instead of
1959 as the birth year value. This would make the person almost two
thousand years.
 
Search WWH ::




Custom Search