Information Technology Reference
In-Depth Information
multi-resolution framework of the overlap detectors is outlined, the relation of
the types of overlap with the conflict level is introduced and assessed on the
Development set, and the results are discussed according to the official measure
of the challenge in terms of the UAR. Section 18.5 describes the multi-expert
architecture of the conflict detector. Various audio features that are related to the
overlap detectors are presented. The results on the conflict detector task on the Test
set are discussed. Section 18.6 presents the study's conclusions.
18.2
Speech Material
The SSPNet corpus (Kim et al. 2012a ) is an international reference for social signal
databases. In the context of political debates, this corpus allows investigations on
conflict to occur during interactions between group members. SSPNet was used for
our study in analyzing various turn-taking characteristics and testing models for
conflict level detection.
18.2.1
SSPNet Corpus
The “SSPNet Conflict Corpus” is a collection of 45 political debates in the French
language that were televised in Switzerland. It represents approximately 12 h of
speech signals; 1,430 audio clips of 30 s duration were extracted from the corpus.
A total of 157 individuals were speaking in the collection of debates (23 females
and 134 males). In the various multiparty discussions of the debates, the roles of
the group members were distinguished: a member of the group held the role of
moderator, and the other members were participants who were taking part in the
debate. Four moderators (1 female, 3 males) and 153 participants (22 females, 131
males) were counted in the database. The SSPNET corpus was distributed for the
Interspeech 2013 ComParE Challenge. Data were split into the Train, Development,
and Test sets: 793 clips were in the Train set, 240 clips were in the Development set,
and 397 were in the Test set. Metadata are available for the Train and Development
sets.
The clips were annotated in terms of the conflict score in the range 10 to C 10
by crowdsourcing, to model the perceptions of the data consumers at a nonverbal
level; metadata were taken to be low-level conflict (LLC) when the score was lower
than 0; otherwise, it was taken to be high-level conflict (HLC). Figure 18.1 shows
the distribution of the clips of the Train set as a function of the conflict score range
(CSR). The clips are split into the two classes of level conflict (LLC and HLC); the
dashed line shows the boundary between the LLC and HLC clips. LLC clips are
predominantly represented in the database (63 % for LLC vs. 37 % for HLC).
Segmentation metadata are available for each clip, indicating the diarization
(“who spoke when”). From these metadata, we can compute the following statistics:
Search WWH ::




Custom Search