BOOSTING INTERVAL-BASED LITERALS: VARIABLE LENGTH AND EARLY CLASSIFICATION - Data Mining in Time Series Databases

Database Reference

In-Depth Information

time series although essentially domain independent, allows the assembly

of very accurate classifiers.

Experiments on different data sets show that in terms of error rate

the proposed method is highly competitive with previous approaches. On

several data sets, it achieves better than all previously reported results

we are aware of. Moreover, although the strength of the method is based

on boosting, the experimental results using point based predicates show

that the incorporation of interval predicates can improve significantly the

obtained classifiers, especially when using less iterations.

An important aspect of the learning method is that it can deal directly

with variable length series. As we have shown, the simple mechanism of

allowing that the evaluation of a literal may give as a result an absten-

tion when there are not enough data to evaluate the literal, make pos-

sible to learn from series of different length. The symbolic nature of the

base classifiers facilitates their capacity to abstain. It also requires the

use of a boosting method able to work with abstentions in the base

classifiers.

Another feature of the method is the ability to classify incomplete exam-

ples. This early classification is indispensable for some applications, when

the time necessary to generate an example can be rather big, and where

it is not an option to wait until the full example is available. It is impor-

tant to notice that this early classification does not influence the learning

process. Particularly, it is relevant the fact that in order to get early clas-

sifiers there are neither necessary additional classifiers nor more complex

classifiers. This early classification capacity is another consequence of the

symbolic nature of the classifier. Nevertheless, an open issue is if it would

be possible to obtain better results for early classification by modifying the

learning process. For instance, literals with later intervals could be some-

what penalized.

An interesting advantage of the method is its simplicity. From a user

point of view, the method has only one free parameter, the number of iter-

ations. Moreover, the classifiers created with more iterations includes the

previously obtained. Hence, it is possible (i) to select only an initial frag-

ment of the final classifier and (ii) to continue adding literals to a previously

obtained classifier. Although less important, from the programmer point of

view the method is also rather simple. The implementation of boosting

stumps is one of the easiest among classification methods.

The main focus of this work has been classification accuracy, at the

expenses of classifier comprehensibility. There are methods that produce

Search WWH ::

Custom Search

Home