Graphics Reference
In-Depth Information
contains only cases of the same class as instance x i . Authors define two properties,
reachability and coverage :
X i
X i ) } ,
Co
v
erage
(
X i ) ={
TR
:
X i
L
(
(8.4)
X i
X i
Reachability
(
X i ) ={
TR
:
L
(
X i ) } ,
(8.5)
In the first phase ICF uses the ENN algorithm to remove the noise from the training
set. In the second phase the ICF algorithm removes each instance X i for which the
Reachability
. This procedure is repeated
for each instance in TR . After that ICF recalculates reachability and coverage
properties and restarts the second phase (as long as any progress is observed).
(
X i )
is bigger than the Co
v
erage
(
X i )
Hit-Miss Network Algorithms (HMN) [ 116 ]—Hit-Miss Networks are directed
graphs where the points are the instances in TR and the edges are connections
between an instance and its nearest neighbor from each class. The edge connecting
a instance with a neighbor of its same class is called “Hit”, and the rest of its edges
are called “Miss”.
The HMNC method builds the network and removes all nodes not connected. The
rest of the nodes are employed to build the final S set.
Hit Miss Network Edition ( HMNE ): Starting from the output of HMNC, HMNE
applies these four rules to prune the network:
1. Every point with more “Miss”edges than “Hit” edges is flagged for removal.
2. If the size of non flagged points of a class is too low, edges with at least one “Hit”
from those classes are unflagged.
3. If there are more than three classes, some points of each class with low number
of “Miss” are unflagged.
4. Points which are the “Hit” of a 25 % or more instances of a class are unflagged.
Finally, the instances which remain flagged for removal are deleted from the net-
work, in order to build the final S set.
Hit Miss Network Edition Iterative ( HMNEI ): The HMNE method can be
employed iteratively until the generalization accuracy of 1-NN on the original
training set with the reduced set decreases.
8.4.3.6 Mixed+Wrapper
Encoding Length Familiy (Explore) [ 18 ] Cameron-Jones used an encoding
length heuristic to determine how good the subset S is in describing TR .His
algorithms use cost function defined by:
COST
(
s
,
n
,
x
) =
F
(
s
,
n
) +
s log 2 () +
F
(
x
,
n
s
) +
x log 2 (
1
)
(8.6)
Search WWH ::




Custom Search