Biomedical Engineering Reference
In-Depth Information
Transformation and Reduction
In the transformation and reduction phase of the knowledge-discovery process, data sets are reduced
to the minimum size possible through sampling or summary statistics. For example, tables of data
may be replaced by descriptive statistics, such as mean and standard deviation.
Transformation involves translating one type of data to another through mathematical or mapping
operations that, for example, map numerical data onto textual data (or vice versa). Transformation
differs from the normalization process in the preprocess and cleaning phase of knowledge discovery
in that the purpose of the transformation isn't to allow the combination of data from multiple sources,
but rather to directly support the data-mining and knowledge-discovery process. For example,
normalized data may be transformed from floating-point (such as 3.14) to integer data to increase
computer processor performance.
Data-Mining Methods
The process of data mining is concerned with extracting patterns from the data, typically using
classification, regression, link analysis, segmentation, or deviation detection (see Figure 7-2 ).
Classification involves mapping data into one of several predefined or newly discovered classes. In
the former case, a set of predefined examples is used to develop a model that can be used to classify
data culled from the data warehouse or database. In the latter case, the system develops its own
models that it uses to classify data according to analysis of the data. In the illustration, there are
three groups or classes of data, (A), (B), and (C). The classification rule may specify minimum
proximity to the center of a particular group, as defined by numerical range or statistical spread, for
example.
Figure 7-2. Data Mining Methods. Classification—Mapping to a class or
group. Regression—Statistical analysis. Link Analysis—Correlation of data.
Deviation Detection—Difference from the norm. Segmentation—Similarity
function.
Data mining based on regression methods involves assigning data a continuous numerical variable
Search WWH ::




Custom Search