Information Technology Reference
In-Depth Information
6 Pattern Analysis Module
The multivariate response obtained from the array of chemical gas sensors with
broad and partially overlapping selectivities will be utilized as an olfactory
blueprint to characterize the considered odors. Besides compensate sensor drift
and noise, the preprocessing module is aimed at preparing data in the most
suitable form for the further pattern analysis module. The latter has indeed
the objective of identifying the most significative components for the consid-
ered problem and of to solve the prediction problem: classification, regression or
clustering [10].
Classification addresses the problem of recognizing an unknown sample as
belonging to one of a predefined and learned set of classes; in regression tasks,
the objective is to predict a set of properties (e.g. concentration) for an ana-
lyte; finally, in clustering tasks the goal is to learn the structural relationship
among different odorants. The task of recognizing a breath belonging to a lung
cancer patient with respect to a healthy breath, belongs to the classification
family tasks. Before analyzing data for classification, the feature matrix need to
be treated in order to reduce the dimensionality of the problem and to maximize
the discriminative information of its components. Indeed, after preprocessing,
the feature matrix is typically characterized by high dimensionality and redun-
dancy. Besides the obvious issues of high complexity and computational cost,
the problem of high dimensionality is related to the more important curse of
dimensionality, which implies that the number of training examples must grow
exponentially with the number of features in order to learn an accurate model.
This means that after a certain dimensionality of the feature space, the perfor-
mance of the classifier decreases: fixed the number of training data, the optimal
number of feature dimensions must be found. Redundancy, that appears when
two or more features are collinear, leads the covariance matrix of the entire
dataset to be singular (and thus noninvertible), which leads, in turn, to numer-
ical problems in various statistical approaches. These considerations call for the
implementation of some dimensionality reduction technique; there are two main
dimensionality reduction approaches:
- Feature selection : the objective is to find the optimal or a suboptimal subset
of features starting from the initial feature set
- Feature projection : the objective is to project features into a lower dimen-
sional space able to maximize the discriminative information or some defined
objective function
6.1 Feature Selection
The goal of feature selection is to find an optimal subset of M features from the
initial N dimensional feature set (with M
N ) that maximizes the objective
function. Given a feature set X =
{
x i ,i =1 , .., N
}
, we want to find a subset
Y M =
N , that optimizes an objective function
J ( Y ) in some way related to the probability of correct classification. The objec-
tive function, that evaluate the goodness of the feature subset, can be related to
{
x 1 i ,x 12 , ..., x iM }
with M
 
Search WWH ::




Custom Search