Audio Data - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Thus, in order to achieve a reliable gold standard close to the ground truth, usually

several annotators (or labellers, raters) are used—the less certain the task is, the more.

There are a couple of measures to identify the agreement among labellers. If the task

is modelled continuously, such as likability of a speaker on a continuous scale or

tempo in beats per minute (BPM), correlation or mean linear/absolute error (MLE,

MAE) among labellers are frequently used.

Further, labellers can be weighted individually in order to reach highest consent

among these with the gold standard. The justification is that labellers may lack in

concentration if they have to label huge amounts of data, or do not take labelling

seriously at any time. The evaluator weighted estimator (EWE) as described in [ 7 ]

provides an elegant model to reach a weighted gold standard y EWE , n :

k = 1 r k

y EWE , n =

r k y n , k ,

(5.1)

where the subscript k represents the rater with k

K , y n , k is the label of rater

k for the instance n , and r k is an evaluator-dependent weight. The EWE's average of

the individual evaluators' responses thus takes the fact that each evaluator is subject

to an individual amount of disturbance during evaluation into account:

,...,

n = 1 y n , k −

1 y n , k

y n

N N

y n −

1 ¯

n =

r k =

n = 1 y n , k −

1 y n , k 2 n = 1

y n 2 .

(5.2)

N N

y n −

1 ¯

n =

These weights measure the correlation between the listener's estimations y n , k and

the average ratings of all evaluators,

y n , where

y n =

y n , k .

(5.3)

The inter-evaluator agreement can be described by the correlation coefficients

(CCs) r k using Eq. ( 5.2 ) and by the standard deviations

σ n of the assessments,

y n , k −

y EWE , n 2

σ n =

(5.4)

−

The standard deviation indicates how similar an audio instance is perceived by the

human listeners. The inter-evaluator correlation measures the agreement among the

individual evaluators and thus focuses on the more general evaluation performance

[ 7 ]. If the weights are chosen constant among raters, the gold standard is the simple

mean of the raters' continuous labels y n , k .

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home