Chain of Audio Processing - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Feature selection/generation : To further reduce the feature space dimensional-

ity, in this step it is decided which features to keep in the feature space and which

to discard. This may be of interest if a new task—e.g., estimation of a speaker's

weight, body surface, race or heart rate, playing effects on a Cajon or Blues harp or

mal-function of a technical system from acoustic properties—is not well known. In

such a case, a multiplicity of features can be 'brute-forced'. From these, the ones

well suited for the task at hand can be kept. Typically, a target function is defined

first. In the case of 'open loop' selection, typical target functions are of information

theoretic nature such as IG or statistical nature such as correlation among features

and of features with the target of the task at hand. In the case of 'closed loop',

the target function is the learning algorithm's accuracy to be maximised. Usually a

search function is needed in addition as an exhaustive search in the feature space is

computationally hardly feasible. Such a search may start with an empty set adding

features in 'forward' direction, with the full set deleting features in 'backward' direc-

tion or bi-directional starting 'somewhere in the middle'. Often random is injected

or the search is based entirely on random selection guided by principles such as

evolutionary, i.e., genetic algorithms. As the search is usually based on accepting

a sub-optimal solution but reducing computation effort, 'floating' is often added to

overcome nesting effects [ 9 , 10 ]. That is, in the case of forward search, (limited)

backward steps are added to avoid a too 'greedy' search. This 'Sequential Forward

Floating Search' is among the most popular in the field, as one typically searches a

small number of final features out of a large set. In addition, generation of further

feature variants can be considered within the selection of features, e.g., by apply-

ing single feature or multiple feature mathematical operations such as logarithm or

division which can lead to better representation in the feature space.

Parameter selection : Parameter selection 'fine tunes' the learning algorithm.

This can comprise optimisation of a learning algorithm's topology, initialisation, the

type of functions, or step sizes in the learning phase, etc. Indeed, the performance

of a machine learning algorithm can be significantly influenced by optimal or sub-

optimal parametrisation. While this step is seldom carried out systematically apart

from varying expert-picked 'typical' values, the most popular approach is likely grid

search. As for the feature selection, it is crucial not to 'tune' on instances used for

evaluation as obviously this would lead to overestimation of performance.

Model learning : This is the actual training phase in which the classifier or regres-

sor model is built based on labelled data. There are classifiers or regressors that do

not need this phase (so-called 'lazy learners') as they only decide at run-time by

training instances' properties which class to choose, e.g., by the training instance

with shortest distance in the feature space to the testing ones. However, these are

seldom used, as they typically do not lead to sufficient accuracy in the rather complex

tasks of Intelligent Audio Analysis and are usually slow and memory consuming at

run-time.

Classification/regression : This step assigns the actual target to an unknown test

instance. In the case of classification, these are discrete labels. In the case of regres-

sion, the output is a continuous value. In general, a high diversity exists in the field of

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home