Database Reference
In-Depth Information
be observed if each training set were completely independent of the others
training sets.
4.3 Computational Complexity
Another useful criterion for comparing inducers and classifiers is their
computational complexity. Strictly speaking computational complexity
is the amount of CPU consumed by each inducer. It is convenient to
differentiate between three metrics of computational complexity:
Computational complexity for generating a new classifier: This is the
most important metric, especially when there is a need to scale the data
mining algorithm to massive datasets. Because most of the algorithms
have computational complexity, which is worse than linear in the numbers
of tuples, mining massive datasets might be prohibitively expensive.
Computational complexity for updating a classifier: Given new data,
what is the computational complexity required for updating the current
classifier such that the new classifier reflects the new data?
Computational complexity for classifying a new instance: Generally this
type of metric is neglected because it is relatively small. However, in
certain methods (like k -nearest neighborhood) or in certain real-time
applications (like anti-missiles applications), this type can be critical.
4.4 Comprehensibility
Comprehensibility criterion (also known as interpretability) refers to how
well humans grasp the induced classifier. While the generalization error
measures how the classifier fits the data, comprehensibility measures the
“mental fit” of that classifier.
Many techniques, like neural networks or support vector machines,
are designed solely to achieve accuracy. However, as their classifiers are
represented using large assemblages of real valued parameters, they are
also dicult to understand and are referred to as black-box models.
However, it is often important for the researcher to be able to inspect
an induced classifier. For such domains as medical diagnosis, users must
understand how the system makes its decisions in order to be confident
of the outcome. Since data mining can also play an important role in
the process of scientific discovery, a system may discover salient features
in the input data whose importance was not previously recognized. If
the representations formed by the inducer are comprehensible, then these
Search WWH ::




Custom Search