Evaluation of Classification Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

be observed if each training set were completely independent of the others

training sets.

4.3 Computational Complexity

Another useful criterion for comparing inducers and classifiers is their

computational complexity. Strictly speaking computational complexity

is the amount of CPU consumed by each inducer. It is convenient to

differentiate between three metrics of computational complexity:

•

Computational complexity for generating a new classifier: This is the

most important metric, especially when there is a need to scale the data

mining algorithm to massive datasets. Because most of the algorithms

have computational complexity, which is worse than linear in the numbers

of tuples, mining massive datasets might be prohibitively expensive.

•

Computational complexity for updating a classifier: Given new data,

what is the computational complexity required for updating the current

classifier such that the new classifier reflects the new data?

•

Computational complexity for classifying a new instance: Generally this

type of metric is neglected because it is relatively small. However, in

certain methods (like k -nearest neighborhood) or in certain real-time

applications (like anti-missiles applications), this type can be critical.

4.4 Comprehensibility

Comprehensibility criterion (also known as interpretability) refers to how

well humans grasp the induced classifier. While the generalization error

measures how the classifier fits the data, comprehensibility measures the

“mental fit” of that classifier.

Many techniques, like neural networks or support vector machines,

are designed solely to achieve accuracy. However, as their classifiers are

represented using large assemblages of real valued parameters, they are

also dicult to understand and are referred to as black-box models.

However, it is often important for the researcher to be able to inspect

an induced classifier. For such domains as medical diagnosis, users must

understand how the system makes its decisions in order to be confident

of the outcome. Since data mining can also play an important role in

the process of scientific discovery, a system may discover salient features

in the input data whose importance was not previously recognized. If

the representations formed by the inducer are comprehensible, then these

Search WWH ::

Custom Search

Home