Biomedical Engineering Reference
In-Depth Information
obtain them will be shorter than the required
by methods using the original data); increased
efficiency (if the system needs to process less
information also less time for its processing will
be required); improved understanding (when two
models solve the same task, but one of them use
less information, this one will be more thoroughly
understood. Furthermore a simpler model, implies
easier knowledge extraction, easier understanding
and easier validation).
Nevertheless, after numerous tests, it was
proved that it was a highly multimodal problem,
since there were several combinations of variables
(obtained using different methods) that offered
similar results when classifying the samples.
samples containing less than 20% pure juice
(Table 1) and samples with more than 20% pure
apple juice (Table 2). IR spectra were obtained
all over the samples.
This data was split into independent and dif-
ferent data sets for extracting the rules (ANN
training) and for validation. The commercial
samples were used to further check the perfor-
mance of the model. It is worth noting that if a
predicted value does not match with that given on
the labels of the commercial products it might be
owing to either a wrong performance of the model
(classification error) or an inaccurate labelling of
the commercial beverage.
Classification Test Considering all
the Original variables
Data Description
In the present practical application, the small
spectral range that was measured by means IR
technique (wavelengths from 1250 cm-1 to 900
cm-1) provides 176 absorbance values (variables
that measure light absorption). Figure 7 shows a
superposition of several typical profiles obtained
with IR spectroscopy.
The main goal is the establishment of the
amount of sugar of a sample , using the absorbance
values returned from the IR measure as data. But
the amount of data extracted from a sample by IR
is huge, so the direct application of mathematical
and/or computational methods (although possible)
requires a lot of time. It is important to establish
whether all those data provide the same amount
of information for sample differentiation, so the
problem becomes an appropriate case for the use
of variable selection techniques.
Previously to variable selection, data sets
should be created for models to be developed
and validated.. Samples with different amounts
of pure apple juice were prepared at the labora-
tory. Besides, 23 apple juice-based beverages sold
in Spain were analysed (the declared amount of
juice printed out on their labels was used as input
data). The samples were distributed in 2 ranges:
The first test involved all the variables given by
the IR spectroscopy. ANN used all the absorbance
values related with the training data to obtain a
reference classification model. Later, the results
obtained with this model were be used to com-
pare the performance of the proposals over the
same data.
Different classification techniques (PLS,
SIMCA, Potential Curves, etc.) were used too
(Gestal et al, 2004), with very similar results.
But the best results were achieved using ANN
(Freeman & Skapura, 1991), which are very
useful for addressing the processes of variable
selection employing GA with ANN-based fitness
functions.
An exhaustive study based on the results of
the different classification models drove to the
conclusion that the set of samples with low (2-
20%) juice percentages are far more complex and
difficult to classify than the samples with higher
concentrations (25-100%). Indeed, the number
of errors was usually higher for the 2-20% range
both in calibration and validation. Classification
of the commercial samples quite agreed with the
percentages of juice declared at the labels, but for
a particular sample. When that sample was studied
Search WWH ::




Custom Search