Database Reference
In-Depth Information
∆y iq ( m )= ∂E ( m )
∂y iq ( m )
2 E ( m )
∂y iq ( m ) 2
and 0 <MF≤
1 .
(2.12)
In particular for large databases, due to the underlying computational com-
plexity of the standard methods, e.g., the NLM with O( N 2 ), mapping compu-
tation becomes infeasible. Therefore, particular interest was placed on heuris-
tic and hierarchical methods of dimensionality reduction as mapping accel-
erators.
One of the first heuristic accelerating methods of the NLM was pub-
lished by Lee, Slaggle, and Blum [2.34]. Rightly assuming that the gradient
procedure does not always achieve an accurate projection (cf., e.g., [2.9]),
they developed a fast distance-preserving mapping that focuses on the ex-
act preservation of only a limited number of 2 N −
3 distances, neglecting
all remaining ones. For this mapping, the minimum spanning tree (MST) of
the data distance graph is computed. Points are mapped by common trian-
gulation while traversing the MST, based on the previously mapped MST
neighbors serving as pivot point. However, in spite of the appealing heuris-
tic idea, MST computation and traversal itself still has O(N 2 )complexity.
Thus, in own prior work, an even faster mapping algorithm was developed
[2.26]. This alternative mapping, denoted as Visor mapping, also uses a tri-
angulation mapping step, but with three fixed global pivot points that are
heuristically chosen from the data set. The purpose of the pivot point de-
termination is to find the three most extruded data points that meet the
additional constraint of maximum mutual distance while enclosing the re-
maining data set. Based on centroid computation, these three data points
are successively selected as pivot points from the data set. These points are
placed first and the remaining N-3 data points are placed in the visualization
plane employing triangulation.
This algorithm, denoted by Visor [2.26], has O(N) complexity and thus
provides data projections with a very short response time and negligible sensi-
tivity to the database size. As shown by prior investigations with a mapping
quality measure, achievable mapping quality is similar to the NLM [2.26],
[2.24]. Due to their salient properties with regard to speed, convenience, and
transparence, distance-preserving mappings have been applied throughout
this work to the regarded semiconductor manufacturing data. In addition,
e cient hierarchical methods, offering a more delicate speed-accuracy trade-
off are available [2.23] and will be employed in the next stages of the work.
The unsupervised mapping methods discussed so far retain all features
from the high-dimensional feature space and compute a more compact opti-
mized feature space, e.g., for visualization and analysis purposes.
In contrast to this, feature selection actually helps to discard incoming
variables that have no or little significance for the tackled problem. It must
be remembered, that two very different aims can be pursued by the method
of feature selection. For classification tasks, the selection of an as-small-as-
possible group is desired, to allow generalization with a minimum classifica-
 
Search WWH ::




Custom Search