Digital Signal Processing Reference
In-Depth Information
Thus, in order to achieve a reliable gold standard close to the ground truth, usually
several annotators (or labellers, raters) are used—the less certain the task is, the more.
There are a couple of measures to identify the agreement among labellers. If the task
is modelled continuously, such as likability of a speaker on a continuous scale or
tempo in beats per minute (BPM), correlation or mean linear/absolute error (MLE,
MAE) among labellers are frequently used.
Further, labellers can be weighted individually in order to reach highest consent
among these with the gold standard. The justification is that labellers may lack in
concentration if they have to label huge amounts of data, or do not take labelling
seriously at any time. The evaluator weighted estimator (EWE) as described in [ 7 ]
provides an elegant model to reach a weighted gold standard y EWE , n :
K
1
k = 1 r k
y EWE , n =
r k y n , k ,
(5.1)
k
=
1
where the subscript k represents the rater with k
K , y n , k is the label of rater
k for the instance n , and r k is an evaluator-dependent weight. The EWE's average of
the individual evaluators' responses thus takes the fact that each evaluator is subject
to an individual amount of disturbance during evaluation into account:
=
1
,...,
n = 1 y n , k
1 y n , k
y n
N N
N N
1
1
y n
¯
1 ¯
n =
n =
r k =
n = 1 y n , k
1 y n , k 2 n = 1
y n 2 .
(5.2)
N N
N N
1
1
y n
¯
1 ¯
n =
n =
These weights measure the correlation between the listener's estimations y n , k and
the average ratings of all evaluators,
y n , where
¯
K
1
K
¯
y n =
y n , k .
(5.3)
k
=
1
The inter-evaluator agreement can be described by the correlation coefficients
(CCs) r k using Eq. ( 5.2 ) and by the standard deviations
σ n of the assessments,
K
y n , k
y EWE , n 2
1
σ n =
.
(5.4)
K
1
k
=
1
The standard deviation indicates how similar an audio instance is perceived by the
human listeners. The inter-evaluator correlation measures the agreement among the
individual evaluators and thus focuses on the more general evaluation performance
[ 7 ]. If the weights are chosen constant among raters, the gold standard is the simple
mean of the raters' continuous labels y n , k .
 
Search WWH ::




Custom Search