Ensembles of Least Squares Classifiers with Randomized Kernels - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

Ensembles of Least Squares Classifiers

with Randomized Kernels

Kari Torkkola 1 and Eugene Tuv 2

1

Motorola, Intelligent Systems Lab, Tempe, AZ, USA

kari.torkkola@motorola.com

2

Intel, Analysis and Control Technology, Chandler, AZ, USA

eugene.tuv@intel.com

Summary. For the recent NIPS-2003 feature selection challenge we studied en-

sembles of regularized least squares classifiers (RLSC). We showed that stochastic

ensembles of simple least squares kernel classifiers give the same level of accuracy as

the best single RLSC. Results achieved were ranked among the best at the challenge.

We also showed that performance of a single RLSC is much more sensitive to the

choice of kernel width than that of an ensemble. As a continuation of this work we

demonstrate that stochastic ensembles of least squares classifiers with randomized

kernel widths and OOB-post-processing often outperform the best single RLSC,

and require practically no parameter tuning. We used the same set of very high di-

mensional classification problems presented at the NIPS challenge. Fast exploratory

Random Forests were applied for variable filtering first.

1 Introduction

Regularized least-squares regression and classification dates back to the work

of Tikhonov and Arsenin [17], and has been re-advocated and revived recently

by Poggio, Smale and others [6, 13-15]. Regularized Least Squares Classifier

(RLSC) is an old combination of quadratic loss function combined with regu-

larization in reproducing kernel Hilbert space, leading to a solution of a simple

linear system. In many cases in the work cited above, this simple RLSC ap-

pears to equal or exceed the performance of support vector machines and

other modern developments in machine learning.

The combination of RLSC with Gaussian kernels and the usual choice of

spherical covariances gives an equal weight to every component of the feature

vector. This poses a problem if a large proportion of the features consists of

noise. With the datasets of the challenge this is exactly the case. In order to

succeed in these circumstances, noise variables need to be removed or weighted

down. We apply ensemble-based variable filtering to remove noise variables. A

Random Forest (RF) is trained for the classification task, and an importance

measure for each variable is derived from the forest [4]. Only highest ranking

K. Torkkola and E. Tuv: Ensembles of Least Squares Classifiers with Randomized Kernels ,

Search WWH ::

Custom Search

Home