Digital Signal Processing Reference
In-Depth Information
Other related feature extraction tools used for speech research include, e.g., the
Hidden Markov Model Toolkit
(
HTK
)[
20
], the
PRAAT
Software [
95
], the
Speech Fil-
ing System
7
(
SFS
), the Auditory Toolbox,
8
a Matlab
TM
toolbox
9
by Raul Fernandez
[
96
], the Tracter framework [
97
], and the
SNACK
10
package for the Tcl scripting
language. However, not all of these tools are distributed under a permissive open-
source license, e.g.,
HTK
and
SFS
.The
SNACK
package is without support since
2004.
For Music Information Retrieval many feature extraction programs under a per-
missive open-source license exist, e.g., the lightweight ANSI C library
libXtract
,
11
the Java based
jAudio
extractor [
98
], the Music Analysis, Retrieval and Synthesis
Software
Marsyas
,
12
the
FEAPI
framework [
99
], the MIRtoolbox,
13
and the
CLAM
framework [
100
]. As for sound, there are hardly any dedicated extractors available.
In general, very few feature extraction utilities exist that unite features from all audio
domains, i.e., speech, music, and sound.
6.5.1 openSMILE's Architecture
This section introduces openSMILE's architecture as seen in Fig.
6.10
.
14
To provide comprehensive and standardised cross-domain feature sets, flexibility
and extensibility, and incremental processing support, a number of requirements
had to be met: First, incremental processing demands for the ability of sample-
wise pushing of audio data from arbitrary input streams such as files or the sound
card through the chain of processing (cf. Fig.
6.11
). Then, a ring-buffer memory for
features is needed and provides temporal context modelling and/or buffering. For an
efficient design, re-usability of data is required to avoid duplicate computation by
multiple feature extractors such as FFT spectra (cf. Fig.
6.11
). Algorithms ideally are
fast and 'lightweight' and were implemented in this respect in C and C++ without
third-party dependencies for the core functions. A modular basis further enables
arbitrary combination of features and invites the research community to add new
feature extractor components, given an application programming interface (API)
and a run-time plug-in interface. To handle asynchronous feature streams, universal
timing information is available for processing of feature frames. Finally, to ensure
high distribution and acceptance, platform independence seems mandatory. Apart
14
A more detailed description can be found in the openSMILE documentation available in the
download package at
http://sourceforge.net/projects/opensmile/
.