Digital Signal Processing Reference
In-Depth Information
In the case of categorical modelling, usually majority votes among the individ-
ual ratings y n , k of the raters are used. A variety of measures can be employed for
agreement evaluation such as Krippendorff's alpha [ 8 ], or (Cohen's) kappa [ 9 ]. As a
continuum can be discretised, the latter statistics can also be used in this case—often
with a linear or quadratical weighting. In the ongoing, we will consider exclusively
kappa, which is defined as follows:
p 0
p c
κ =
p c ,
(5.5)
1
where p 0 is the measured agreement among two labellers and p c is the chance-level
of agreement. If labellers agree throughout,
κ
equals 1. If they agree only on the same
level as chance would, then
equals 0. Negative values indicate systematic disagree-
ment. According to [ 10 ], values of 0.4-0.6 indicate moderate agreement, such above
are considered as good to excellent agreement. This is known as Cohen's kappa [ 9 ]—
the extension to several raters is known as Fleiss's kappa, and linear and quadratic
weighting are commonly used in the case of ordinal-scaled class properties [ 11 ].
In order to demonstrate typical data collection for Intelligent Audio Analysis,
three examples are picked in the ongoing: One from speech, music, and general
sound data, each.
κ
5.3 Exemplary Databases
5.3.1 Example in Speech: TUM AVIC
Let us first exemplify the collection of speech data in the context of determin-
ing speaker interest. This task particularly demonstrates the difficulty of collecting
diverse data: Various levels of interest need to be captured in a realistic setting.
In TUM's Audiovisual Interest Corpus (TUM AVIC) as described in detail in [ 12 ],
an experimenter and a subject are sitting on opposite sides of a desk. The experimenter
plays the role of a product presenter and leads the subject through a commercial
presentation. The subject's role is to listen to explanations and topic presentations of
the experimenter, ask several questions of her/his interest, and actively interact with
the experimenter considering his/her interest in the addressed topics. The subject
was explicitly asked not to worry about being polite to the experimenter, e.g., by
always showing a certain level of 'polite' attention. Voice data was recorded by two
microphones—one headset and one far-field microphone. Recordings were stored
with 44.1 kHz, 16 bit. 21 subjects took part in the recordings, three of them Asian,
the remaining European. The language throughout experiments is English, and all
subjects are non-native, yet very experienced English speakers.
More details on the subjects are summarised in Table 5.2 .
 
Search WWH ::




Custom Search