Pattern Classification Techniques for Lung Cancer Diagnosis by an Electronic Nose - Computational Intelligence in Healthcare: Advanced Methodologies

Information Technology Reference

In-Depth Information

6 Pattern Analysis Module

The multivariate response obtained from the array of chemical gas sensors with

broad and partially overlapping selectivities will be utilized as an olfactory

blueprint to characterize the considered odors. Besides compensate sensor drift

and noise, the preprocessing module is aimed at preparing data in the most

suitable form for the further pattern analysis module. The latter has indeed

the objective of identifying the most significative components for the consid-

ered problem and of to solve the prediction problem: classification, regression or

clustering [10].

Classification addresses the problem of recognizing an unknown sample as

belonging to one of a predefined and learned set of classes; in regression tasks,

the objective is to predict a set of properties (e.g. concentration) for an ana-

lyte; finally, in clustering tasks the goal is to learn the structural relationship

among different odorants. The task of recognizing a breath belonging to a lung

cancer patient with respect to a healthy breath, belongs to the classification

family tasks. Before analyzing data for classification, the feature matrix need to

be treated in order to reduce the dimensionality of the problem and to maximize

the discriminative information of its components. Indeed, after preprocessing,

the feature matrix is typically characterized by high dimensionality and redun-

dancy. Besides the obvious issues of high complexity and computational cost,

the problem of high dimensionality is related to the more important curse of

dimensionality, which implies that the number of training examples must grow

exponentially with the number of features in order to learn an accurate model.

This means that after a certain dimensionality of the feature space, the perfor-

mance of the classifier decreases: fixed the number of training data, the optimal

number of feature dimensions must be found. Redundancy, that appears when

two or more features are collinear, leads the covariance matrix of the entire

dataset to be singular (and thus noninvertible), which leads, in turn, to numer-

ical problems in various statistical approaches. These considerations call for the

implementation of some dimensionality reduction technique; there are two main

dimensionality reduction approaches:

- Feature selection : the objective is to find the optimal or a suboptimal subset

of features starting from the initial feature set

- Feature projection : the objective is to project features into a lower dimen-

sional space able to maximize the discriminative information or some defined

objective function

6.1 Feature Selection

The goal of feature selection is to find an optimal subset of M features from the

initial N dimensional feature set (with M

≤

N ) that maximizes the objective

function. Given a feature set X =

{

x i ,i =1 , .., N

}

, we want to find a subset

Y M =

N , that optimizes an objective function

J ( Y ) in some way related to the probability of correct classification. The objec-

tive function, that evaluate the goodness of the feature subset, can be related to

{

x 1 i ,x 12 , ..., x iM }

with M

≤

Computational Intelligence in Healthcare: Advanced Methodologies

Search WWH ::

Custom Search

Home