Detecting Speech Interruptions for Automatic Conflict Detection - Conflict and Multimodal Communication: Social Research and Machine Intelligence - page 376

Information Technology Reference

In-Depth Information

in gray, and H-segments in black. The other rows contain the relabeling according

to the various detectors. For the three rows f N, O g _ x ( x 2f 1, 2, 5 g ), a segment is

labeled O when it contains a part of overlap and, otherwise, N. For the three rows

f N, L, H g _ x ( x 2f 1, 2, 5 g ), overlap segments are labeled according to the conflict

level of the clip: L for LLC-Ov and H for HLC-Ov.

Figure 18.5 gives an instance of metadata relabeling for the LLC clip

#Train_0001. For this clip, an LLC-Ov occurs over 13.01 and 14.4 s. The relabeling

is O for the segments 14 and 15 of f N, O g _1, the segments 7 and 8 of f N, O g _2, and

thesegment3of f N, O g _5. The relabeling is L for the segments 14 and 15 of f N,

L, H g _1, the segments 7 and 8 of f N, L, H g _2, and the segment 3 of f N, L, H g _5.

Figure 18.6 gives an instance of metadata relabeling for the HLC clip

#Train_0006. For this clip, HLC-Ovs occur over 14.9 and 18.9 s and over 28.3

and 30 s. The relabeling is O for the segments 15, 16, 17, 18, 19, 29, and 30 of f N,

O g _1, for the segments 8, 9, 10, and 15 of f N, O g _2, and for the segments 3, 4, and

6of f N, O g _5. The relabeling is H for the segments 15, 16, 17, 18, 19, 29, and 30

of

f N, L, H g _1, for the segments 7, 8, 9, 10, and 15 of

f N, L, H g _2, and for the

segments 3, 4, and 6 of f N, L, H g _5.

18.4.2

Two-Class f N, O g Classifiers

Using relabeling, three two-class SVMs ( f N, O g _1, f N, O g _2, f N, O g _5) were

estimated on the Train set. Each SVM classifies a segment of a given duration

(1, 2, and 5 s) into overlap (O) or Non-Ov (N). To account for the imbalanced

class distribution, the upper-represented category (N) was down-sampled by a given

factor. A factor of 4 was applied for the f N, O g _1 detector, a factor of 3 for the

f N, O g _2 detector, and a factor of 2 for the f N, O g _5 detector. We investigated the

effects of different feature sets on the accuracy of the overlap speech detection.

Table 18.4 gives the accuracy rates (N-Acc. and O-Acc. in %) of the two-class

Table 18.4 Accuracy rates of the detectors f N, O g on the Development

set according to the feature sets. In bold, the best feature set

Detectors f N, O g

Feature set

N-Acc. (%)

O-Acc. (%)

UAR (%)

f N, O g _1

IS-2010

86.7

73.9

80.3

f N, O g _1

IS-2011

87.7

72.3

80.0

f N, O g _1

IS-2012

87.8

71.6

79.7

f N, O g _2

IS-2010

85.1

75.1

80.1

f N, O g _2

IS-2011

87.3

71.6

79.5

f N, O g _2

IS-2012

87.4

71.7

79.5

f N, O g _5

IS-2010

82.7

78.7

80.7

f N, O g _5

IS-2011

84.9

75.3

80.1

f N, O g _5

IS-2012

84.0

75.7

79.8

Next Page

Conflict and Multimodal Communication: Social Research and Machine Intelligence

Search WWH ::

Custom Search

Home