Digital Signal Processing Reference
In-Depth Information
Run-time benchmarks were carried out under Ubuntu (11.10) Linux on an AMD
FX-8120 at 3.1 GHz with 16 GB dual-channel DDR3-1866 RAM using only one of
the eight available cores (i.e., running all openSMILE components in a single thread).
Real-time factors (RTF) were computed by timing the CPU time required for extract-
ing features from 30 min of monaural 16 kHz PCM (uncompressed) audio data similar
to the benchmark in [ 94 ]. We used the latest SVN revision 822 (Dec/11/2012) for
this benchmark. 12 MFCC coefficients with first and second order delta coefficients
were extracted with an RTF of 0.008 12 MFCC 0.008. The INTERSPEECH 2011
Speaker State Challenge baseline feature set was extracted with an RTF of 0.037,
and the INTERSPEECH 2012 Speaker Trait Challenge baseline feature set with an
RTF of 0.041.
To conclude, openSMILE was introduced as an example of a feature extractor
tailored to be an efficient, on-line as well as batch scriptable, open-source, cross
platform, and extensible tool implemented in C++ with a well structured API. Despite
being rather new, it is increasingly turning into a standard toolkit—in particular in
the field of computational paralinguistics. 15 Moreover, the openEAR project [ 104 ]
builds on openSMILE and extends it by integrated classification algorithms and data-
trained models for various Intelligent Audio Analysis tasks [ 104 ]. Development of
openSMILE is still active and even more features and signal processing components
such as TEAGER energy, TOBI pitch descriptors, Gabor filterbanks, and modulation
spectra are considered for integration.
Figure 6.13 gives a final overview on the principle of feature extraction.
6.6 Reduction and Selection of Features
Having discussed the principle of feature brute-forcing in the last section, it is next
important to be able to reduce these to the most relevant ones. Otherwise, the ratio
between parameters to be trained for a machine learning algorithm—which usually
increases with increasing number of features—may become to large in comparison
to the available amount of data.
Feature selection usually first requires a measure for the evaluation of a feature's
merit. In terms of the quality of the resulting set of selected features, this is best
solved by employing the target classifier or regressor in a 'wrapper' manner and its
accuracy as evaluation measure [ 18 , 102 ]. In order to save computation time as highly
repeated training of and testing with a machine learning algorithm can easily become
computationally expensive, one can chose an alternative learning algorithm that can
be trained and evaluated faster. This comes, however, at the risk of introducing a
bias as the feature set is not optimised for the exact learning algorithm that will be
used later in a system. An alternative are 'filter' methods for the determination of
15 openSMILE was awarded third place in the ACM Multimedia 2010 Open-Source Software
Competition. It was further used as standard feature extractor for baseline computation and use by
participants in six research challenges.
 
Search WWH ::




Custom Search