Information Technology Reference
In-Depth Information
weakness of NSAs, especially those in real-valued representation. Other than the
general di culties in learning algorithms, such as high dimensionality, there exist
some issues with RNS, which are as follows:
Because the matching process in an NSA (or any learning algorithms) is
built on the concept of a nity or distance, the results based on some con-
verted discrete data may be fallacious. For example, the converted points will
be distributed on separated (parallel) planes in the real space. h e distance
within one plane should not be interpreted in the same way as the distance
between the planes. h e connotation of being closer or farther apart is not
the same as in the original data space. h erefore, the converted real-valued
data not only fail to contribute to measure the distance or a nity between
two points, they also limit the reasonable choice of a threshold for other fi elds
in data.
h e matching rule usually takes the form of a distance measure; selection of
a specifi c matching rule should be according to the representation and detec-
tor shape.
Detector coverage depends on the interpretation of training data, which in
most cases are incomplete (one-class classifi cation problem). h e statistical
estimate of coverage using random sampling does not take the probability
distribution of the data to be tested into consideration. h us, the notion of
enough coverage is always bias, which depend on how diff erent the actual dis-
tribution is from uniform distribution. Detection rate, in contrast, depends
on the actual distribution of test data.
Although the issues of NSAs' applicability is still an open debate, many di culties
reported in recent years are not related to the RNS algorithm itself. For example,
the di culty of high dimensionality, decision on optimal control parameters, and
a good data model of the application domain are all important implementation
issues for all methods.
4.8
Positive Selection (Detection)
In contrast to NS, “positive detection techniques” are widely used in pattern rec-
ognition, clustering, and other domains, where they generate a set of detectors that
match self-points (instead of nonself points). In this case, a model of the self-set
(training data) is used to classify a sample as part of either self or nonself. A simple
model of a positive detection could be built using a nearest neighbor approach. If a
point lies in a neighborhood of a sample self-point, then it will be labeled as belong-
ing to the self-set (Figure 4.19).
Generally, a positive detector defi nes the neighborhood by assuming a hyper-
sphere with a certain radius centered on each of the self-points. Moreover, detectors
can be defi ned in a more sophisticated way by using some clustering algorithm
Search WWH ::




Custom Search