Digital Signal Processing Reference
In-Depth Information
Other related feature extraction tools used for speech research include, e.g., the
Hidden Markov Model Toolkit ( HTK )[ 20 ], the PRAAT Software [ 95 ], the Speech Fil-
ing System 7 ( SFS ), the Auditory Toolbox, 8 a Matlab TM toolbox 9 by Raul Fernandez
[ 96 ], the Tracter framework [ 97 ], and the SNACK 10 package for the Tcl scripting
language. However, not all of these tools are distributed under a permissive open-
source license, e.g., HTK and SFS .The SNACK package is without support since
2004.
For Music Information Retrieval many feature extraction programs under a per-
missive open-source license exist, e.g., the lightweight ANSI C library libXtract , 11
the Java based jAudio extractor [ 98 ], the Music Analysis, Retrieval and Synthesis
Software Marsyas , 12 the FEAPI framework [ 99 ], the MIRtoolbox, 13 and the CLAM
framework [ 100 ]. As for sound, there are hardly any dedicated extractors available.
In general, very few feature extraction utilities exist that unite features from all audio
domains, i.e., speech, music, and sound.
6.5.1 openSMILE's Architecture
This section introduces openSMILE's architecture as seen in Fig. 6.10 . 14
To provide comprehensive and standardised cross-domain feature sets, flexibility
and extensibility, and incremental processing support, a number of requirements
had to be met: First, incremental processing demands for the ability of sample-
wise pushing of audio data from arbitrary input streams such as files or the sound
card through the chain of processing (cf. Fig. 6.11 ). Then, a ring-buffer memory for
features is needed and provides temporal context modelling and/or buffering. For an
efficient design, re-usability of data is required to avoid duplicate computation by
multiple feature extractors such as FFT spectra (cf. Fig. 6.11 ). Algorithms ideally are
fast and 'lightweight' and were implemented in this respect in C and C++ without
third-party dependencies for the core functions. A modular basis further enables
arbitrary combination of features and invites the research community to add new
feature extractor components, given an application programming interface (API)
and a run-time plug-in interface. To handle asynchronous feature streams, universal
timing information is available for processing of feature frames. Finally, to ensure
high distribution and acceptance, platform independence seems mandatory. Apart
7 http://www.phon.ucl.ac.uk/resource/sfs/
8 http://cobweb.ecn.purdue.edu/malcolm/interval/1998-010/
9 http://affect.media.mit.edu/publications.php
10 http://www.speech.kth.se/snack/
11 http://libxtract.sourceforge.net/
12 http://marsyas.sness.net/
13 https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox
14 A more detailed description can be found in the openSMILE documentation available in the
download package at http://sourceforge.net/projects/opensmile/ .
 
Search WWH ::




Custom Search