Digital Signal Processing Reference
In-Depth Information
13.2 Best Practice Recommendations
In the following, best practice recommendations as based on the presented content of
the topic are given. These again follow the chain of processing from data provision to
its enhancement, feature extraction, classification or regression, and output encoding
for optimal system embedding.
High realism :[ 84 ] In order to evaluate systems for Intelligent Audio Analysis
in conditions close to real-life application, realistic data are needed [ 66 ]. However,
progress in this direction is often slow in the field. This is likely owing to the high
effort of collecting and annotating such data. Realism concerns in particular the
choice of testing instances. To assess an Intelligent Audio Analysis's system perfor-
mance in a realistic way, these may not be restricted to prototypical, straightforward
cases. If a pre-selection is applied at all, e.g., to gain performance bounds, it needs
to be based on objective and transparent criteria rather than on 'intuitive' expert-
selection. While methods such as semi-supervised or synthesis of training material
have been named in this topic, they are less suited for the collection of test-instances.
Even crowd-sourcing may—depending on the task—be more appropriate for the col-
lection of training data if laymen are involved. Realismfurther touches pre-processing
such as chunking according to acoustic or symbolic, e.g., linguistic criteria. In most
real-life applications, chunking will be expected to work automatically and should
be oriented on acoustic LLDs. An example is a audio activity based chunking, which
easily becomes challenging in reverberant or noisy acoustic conditions. If additional
meta-information or common knowledge is exploited in the analysis process, the
information should be automatically retrieved from publicly available knowledge-
sources, e.g., by web-based queries as was shown in this topic, e.g., for chord lead
sheets, lyrics and word information. If such information includes individual experts'
knowledge on the test cases this may result in a considerable bias of accuracies to be
expected for unseen material. Finally, real-life applications imply highest possible
independence of training and test conditions inmost cases. This can be established by
partitioning into train, development and test sets [ 63 ]. Today, however, often random
cross-validation without known random seed for partitioning is employed especially
in case of small data sets, to ensure significance of results. Using an independent
and stratified subdivision according to simple criteria (e.g., splitting according to
instance IDs, by speaker or composer, etc.) is a transparent alternative to keep the
statistical significance. Otherwise, the random seed should be provided together with
the toolkit for reproduction of the partitions or a download for an archive containing
the instance or file list may be provided.
Standardised, multi-faceted and machine-aided data collection : Publicly
available audio data with rich annotation are still sparse [ 84 ]. Even with a recently
increasing number of available databases ready for experimentation, these often come
with different labelling schemes such as discrete versus continuous task represen-
tation. This can make cross-corpus evaluation and data agglomeration [ 85 ] partly
difficult. 'Translation' schemes and standards are therefore needed to map from one
task representation to another or for the task representation itself and should be
 
Search WWH ::




Custom Search