Information Technology Reference
In-Depth Information
When no instance in storage is acceptable, we adopt a policy that assumes
that at least one instance is acceptable. If none of the saved instances are
acceptable, a random number will be generated from [
is the
number of saved instances. Then the most similar saved instances'
classification records are updated. If there is at least one acceptable instance,
say
1, n
], where
n
, then the instances whose classification records need updating are those
that are in the hyper-sphere, centered on i with radius equal to the normalized
distance between i and its nearest neighbor.
IB3 employs the notion of confidence interval of proportions test to
determine whether an instance is acceptable, mediocre, or noisy. Confidence
intervals are determined by both the instance's current classification accuracy
and the observed relative frequency of the class it belongs to. The instance is
acceptable in the cases where its accuracy interval's lower endpoint is greater
than the class frequency interval's higher endpoint. Similarly, instances are
noisy when their accuracy interval's higher endpoint is less than their class
frequency interval's lower endpoint. If the two intervals overlap, then it is
mediocre and a further decision process will be employed to decide whether
the instance is to be accepted or discarded.
The learning performance IB3 is highly sensitive to the number of
irrelevant features employed in instance descriptions. Meanwhile, with
increasing dimensionality, its storage requirements increase exponentially, yet
learning rate decreases exponentially. IB3 cannot represent overlapping
concepts and respects the assumption that every instance is a member of
exactly one concept. Fortunately, this can be solved by learning a separate
description for each concept.
IBL has the following advantages. The approach is simple, relatively
robust; and its algorithms have a relatively relaxed concept bias. They learn
piecewise-linear approximations of concepts incrementally with faster
learning rate than other algorithms when their bias is not satisfied by target
concepts, especially when the boundary of the target concept is not parallel to
the feature dimensions. Finally, the updating costs in IBL algorithms are
relatively low. IBL updating costs includes classification costs, which
requires
i
O
(|
N
|
*
|
A
|)
| N
|
feature examinations, where
is the number of
|
A
|
saved instances and
is the number of features used in instance
2
O
(|
I
|
*
|
A
|
)
descriptions. C4.5 requires
| is
the size of the training set. Meanwhile, parallel processing methods and index
feature examinations, where |
I
Search WWH ::




Custom Search