Information Technology Reference
In-Depth Information
political debates in French language. It includes a rich set of socially relevant
annotations, such as turn-taking (who speaks when and how much), agreement
and disagreement between participants, and the role played by the people involved.
Each debate includes one moderator and two coalitions opposing one another on
the issues of the day. A subset of this database, composed of 45 debates with four
guests (two guests in each group) plus one moderator, has been annotated in terms
of conflict level. The debates have been segmented into 30-s long uniform, non-
overlapping clips, assuming that the levels of conflict are stationary within the time
period.
The SC 2 corpus includes 138 subjects in total, 23 females (1 moderator and
22 participants) and 133 males (3 moderators and 120 participants). The clips
were annotated in two ways in terms of their conflict level by approximately 550
assessors recruited via Amazon Mechanical Turk: First, a continuous conflict score
in the range [ 10; C10 ] was assigned to each clip, which allows to perform a
straightforward regression task ( score ). Second, based on these score labels each
clip was classified to be either of high conflict or low conflict, depending if the score
value assigned to it being 0 or <0 , respectively, thus giving rise to a classification
task ( class ).
As several subjects occur in debates with different moderators, a truly speaker-
independent partitioning of the data is not possible. Since all participants (apart from
the moderators) do not occur more than a couple of times (most of them only once),
the following strategy was followed to reduce speaker dependence to a minimum:
All broadcasts with the female moderator (speaker no. 50) were assigned to the
training set. The development set consists of all broadcasts moderated by the (male)
speaker no. 153 and the test set comprises the rest of the broadcasts, containing all
remaining male moderators. This further ensures that the development and test sets
are similar in case the gender of the moderator should have an influence.
The resulting distribution of the data is shown in Table 19.1 along with the
respective binary class labels. Histograms of the continuous score ratings over the
partitions are depicted in Fig. 19.4 .
Table 19.1 Partitioning of the SSPNet
Conflict Corpus into train, devel(opment),
and test sets for binary classification
(“low” Œ10;0Œ , “high” Œ0; C10 )
#
train
devel
test
Low
471
127
226
824
High
322
113
171
606
793
240
397
1,430
 
Search WWH ::




Custom Search