Audio Features - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Other related feature extraction tools used for speech research include, e.g., the

Hidden Markov Model Toolkit ( HTK )[ 20 ], the PRAAT Software [ 95 ], the Speech Fil-

ing System 7 ( SFS ), the Auditory Toolbox, 8 a Matlab TM toolbox 9 by Raul Fernandez

[ 96 ], the Tracter framework [ 97 ], and the SNACK 10 package for the Tcl scripting

language. However, not all of these tools are distributed under a permissive open-

source license, e.g., HTK and SFS .The SNACK package is without support since

2004.

For Music Information Retrieval many feature extraction programs under a per-

missive open-source license exist, e.g., the lightweight ANSI C library libXtract , 11

the Java based jAudio extractor [ 98 ], the Music Analysis, Retrieval and Synthesis

Software Marsyas , 12 the FEAPI framework [ 99 ], the MIRtoolbox, 13 and the CLAM

framework [ 100 ]. As for sound, there are hardly any dedicated extractors available.

In general, very few feature extraction utilities exist that unite features from all audio

domains, i.e., speech, music, and sound.

6.5.1 openSMILE's Architecture

This section introduces openSMILE's architecture as seen in Fig. 6.10 . 14

To provide comprehensive and standardised cross-domain feature sets, flexibility

and extensibility, and incremental processing support, a number of requirements

had to be met: First, incremental processing demands for the ability of sample-

wise pushing of audio data from arbitrary input streams such as files or the sound

card through the chain of processing (cf. Fig. 6.11 ). Then, a ring-buffer memory for

features is needed and provides temporal context modelling and/or buffering. For an

efficient design, re-usability of data is required to avoid duplicate computation by

multiple feature extractors such as FFT spectra (cf. Fig. 6.11 ). Algorithms ideally are

fast and 'lightweight' and were implemented in this respect in C and C++ without

third-party dependencies for the core functions. A modular basis further enables

arbitrary combination of features and invites the research community to add new

feature extractor components, given an application programming interface (API)

and a run-time plug-in interface. To handle asynchronous feature streams, universal

timing information is available for processing of feature frames. Finally, to ensure

high distribution and acceptance, platform independence seems mandatory. Apart

7 http://www.phon.ucl.ac.uk/resource/sfs/

9 http://affect.media.mit.edu/publications.php

10 http://www.speech.kth.se/snack/

11 http://libxtract.sourceforge.net/

12 http://marsyas.sness.net/

14 A more detailed description can be found in the openSMILE documentation available in the

download package at http://sourceforge.net/projects/opensmile/ .

Search WWH ::

Custom Search

Home