Graphics Reference
In-Depth Information
Data Cleaning
Data Normalization
Data Transformation
Missing Values Imputation
Noise Identification
Data Integration
Fig. 1.3 Forms of data preparation
1.6.1.2 Data Transformation
In this preprocessing step, the data is converted or consolidated so that the mining
process result could be applied or may be more efficient. Subtasks inside data trans-
formation are the smoothing, the feature construction, aggregation or summarization
of data, normalization, discretization and generalization. Most of them will be seg-
regated as independent tasks, due to the fact that data transformation, such as the
case of data cleaning, is referred to as a general data preprocessing family of tech-
niques. Those tasks that require human supervision and are more dependent on the
data are the classical data transformation techniques, such as the report generation,
new attributes that aggregate existing ones and generalization of concepts especially
in categorical attributes, such as the replacing complete dates in the database with
year numbers only.
1.6.1.3 Data Integration
It comprises the merging of data from multiple data stores. This process must be
carefully performed in order to avoid redundancies and inconsistencies in the re-
sulting data set. Typical operations accomplished within the data integration are the
identification and unification of variables and domains, the analysis of attribute cor-
relation, the duplication of tuples and the detection of conflicts in data values of
different sources.
 
 
Search WWH ::




Custom Search