Database Reference
In-Depth Information
applied to semiconductor application data for basic feasibility demonstration
in Section 2.4.
2.3.2 Dimensionality Reduction and Interactive Visualization
Motivation. In addition to semiconductor manufacturing, a wide variety
of other technical problems are characterized by typically large sets of high-
dimensional data, obtained, e.g., from sensor registration, medical laboratory
parameters, manufacturing process parameters, financial databases, measure-
ments, or other generally observed features. With regard to the given applica-
tion, significance, correlations, redundancy, and irrelevancy of the variables x i
are a priori unknown. The extraction of underlying knowledge or the reliable
automatic classification requires reduction of the initial data set to the es-
sential information and the corresponding variables. This especially holds, as
the well-known curse of dimensionality (COD) [2.12] makes the compaction
of the data a mandatory prerequisite for reliable decision making. Unsuper-
vised and supervised methods can be employed for this reduction step for
interactive and automatic processing of the data. The exploitation of the
remarkable human perceptive and associative capabilities for the complex
problem of identifying nonobvious correlations, structure, and hidden knowl-
edge in the data can be a powerful complement of existing computational
methods. Of course, an appropriate visual representation is required, which
can be achieved by means of dimensionality reduction or multivariate pro-
jection methods combined with interactive visualization of the data [2.49].
Typical database representation, e.g., as an Excel spread-sheet is not eas-
ily amenable to human perception and understanding. This is illustrated in
Fig. 2.10, together with the alternative human-adapted visual representation
of the same database. Thus, dimensionality reduction is a ubiquitous prob-
lem and together with multivariate data visualization a topic of interest and
interdisciplinary research for more than three decades. Applications of high
economical interest, e.g., the one investigated in this work and other data
mining and knowledge discovery applications, give renewed strong incentive
to the field. Numerous methods were derived in the past for dimensionality
reduction that considerably differ with regard to the methodology, computa-
tional complexity, transparence, and ease of use. In this work, effective meth-
ods promising the best productivity increase will be preferred. The following
common definitions of two main groups of dimensionality reduction methods,
briefly adapted from [2.16], shall clarify the pursued objectives. For a given
sample set X with NM -dimensional feature vectors x =[ x 1 ,x 2 ,...,x M ] T
feature extraction is defined as a transformation
J ( A )= max A J (
A
( v ))
(2.5)
and the special case of feature selection is defined as a transformation
J ( A S )= max A S J (
A S ) .
(2.6)
Search WWH ::




Custom Search