Audio Features - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Run-time benchmarks were carried out under Ubuntu (11.10) Linux on an AMD

FX-8120 at 3.1 GHz with 16 GB dual-channel DDR3-1866 RAM using only one of

the eight available cores (i.e., running all openSMILE components in a single thread).

Real-time factors (RTF) were computed by timing the CPU time required for extract-

ing features from 30 min of monaural 16 kHz PCM (uncompressed) audio data similar

to the benchmark in [ 94 ]. We used the latest SVN revision 822 (Dec/11/2012) for

this benchmark. 12 MFCC coefficients with first and second order delta coefficients

were extracted with an RTF of 0.008 12 MFCC 0.008. The INTERSPEECH 2011

Speaker State Challenge baseline feature set was extracted with an RTF of 0.037,

and the INTERSPEECH 2012 Speaker Trait Challenge baseline feature set with an

RTF of 0.041.

To conclude, openSMILE was introduced as an example of a feature extractor

tailored to be an efficient, on-line as well as batch scriptable, open-source, cross

platform, and extensible tool implemented in C++ with a well structured API. Despite

being rather new, it is increasingly turning into a standard toolkit—in particular in

the field of computational paralinguistics. 15 Moreover, the openEAR project [ 104 ]

builds on openSMILE and extends it by integrated classification algorithms and data-

trained models for various Intelligent Audio Analysis tasks [ 104 ]. Development of

openSMILE is still active and even more features and signal processing components

such as TEAGER energy, TOBI pitch descriptors, Gabor filterbanks, and modulation

spectra are considered for integration.

Figure 6.13 gives a final overview on the principle of feature extraction.

6.6 Reduction and Selection of Features

Having discussed the principle of feature brute-forcing in the last section, it is next

important to be able to reduce these to the most relevant ones. Otherwise, the ratio

between parameters to be trained for a machine learning algorithm—which usually

increases with increasing number of features—may become to large in comparison

to the available amount of data.

Feature selection usually first requires a measure for the evaluation of a feature's

merit. In terms of the quality of the resulting set of selected features, this is best

solved by employing the target classifier or regressor in a 'wrapper' manner and its

accuracy as evaluation measure [ 18 , 102 ]. In order to save computation time as highly

repeated training of and testing with a machine learning algorithm can easily become

computationally expensive, one can chose an alternative learning algorithm that can

be trained and evaluated faster. This comes, however, at the risk of introducing a

bias as the feature set is not optimised for the exact learning algorithm that will be

used later in a system. An alternative are 'filter' methods for the determination of

15 openSMILE was awarded third place in the ACM Multimedia 2010 Open-Source Software

Competition. It was further used as standard feature extractor for baseline computation and use by

participants in six research challenges.

Search WWH ::

Custom Search

Home