Audio Data - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Table 5.1 Requirements for database building

Requirement

Example

Quantity

“There's no data like more data”

High diversity with respect to manifold influence factors

Reasonably balanced distribution of instances among classes / range

Knowledge of natural distribution among classes / range ('priors')

Quality

Adequate data

Realistic data

Ideal capture conditions

Intended corruption

Modelling

Reasonable categorisation

Well-defined mappings between models

Labelling

Unique and additional labelling (text+events, labeller tracks, context, etc.)

High number of labellers

Provision of gold standard's reliability

Release

Documentation of side conditions

Additional perception tests

Free release of the data with high accessibility

Defined partitioning

synthesised training material was shown to be highly beneficial in cross-corpus test-

ing, i.e., using a different database for training then for testing.

5.2 Ground Truth and Gold Standard

Often in Intelligent Audio Analysis, the gold standard is not reliable, i.e., the training

and testing labels themselves may be erroneous. This highly depends on the task:

For example, the age of a speaker is usually known, but the emotion of a speaker is

usually difficult to assess. Similarly, the tempo of a musical piece can be determined

somewhat reliably by human annotators, while the ballroom dance style may be

ambiguous for a pop or rock song, as often several can fit, etc.

The terms 'ground truth' and 'gold standard' are often used more or less as

synonyms in the literature—here, we want to define 'ground truth' as the actual truth

as measured on the ground as compared to the 'gold standard' that might ideally be

identical with the ground truth, however, it might also be the (slightly) error-prone

labelling as seen from the 'sky above'. 2 When interpreting results, one thus has to

bear in mind that the reference is usually the gold standard and not necessarily the

ground truth. This has a double impact: On the one hand side the learnt models

are error-prone—on the other hand side, the test results might be over- or under-

interpretations.

2

The term ground truth indeed originated in the fields of aerial photographs and satellite imagery.

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home