Information Technology Reference
In-Depth Information
Table 18.5
Accuracy rates of the detectors
f N, L, H g
on the Development set
according to the feature sets. In bold, the best detector
Detectors
Feature set
N-Acc. (%)
L-Acc. (%)
H-Acc. (%)
UAR (%)
f N, L, H g _1
IS-2010
78.0
31.7
73.5
61.1
f N, L, H g _1
IS-2011
79.9
32.7
70.5
61.0
f N, L, H g _1
IS-2012
79.4
31.4
71.4
60.7
f N, L, H g _2
IS-2010
79.5
32.6
71.2
61.2
f N, L, H g _2
IS-2011
78.1
35.9
70.0
61.3
f N, L, H g _2
IS-2012
80.5
31.5
68.0
60.0
f N, L, H g _5
IS-2010
77.5
44.7
68.3
63.5
f N, L, H g _5
IS-2011
76.4
40.0
67.7
61.4
f N, L, H g _5
IS-2012
80.8
38.2
67.4
62.1
we chose to use only the best three-class classifier: f N, L, H g _5 with the IS-2010
audio feature set. Our assumption is that only the best overlap classifier is relevant
for the detection of conflict.
18.4.4
Audio Characteristics of Overlaps
Previous studies (Smolenski and Ramachandran 2011 ; Shokouhi et al. 2013 )have
shown that the audio characteristics of overlapping speech are different from speech
in which a lonely speaker occurs. We looked for the discriminating cues (1) between
Ov and Non-Ov and (2) more specifically between HLC-Ov and LLC-Ov. For these
investigations, we chose to study the segments that had a 5-s duration in the Train
set for the best accuracy results of the 5-s-based f N, O g and f N, L, H g detectors
(see, respectively, Tables 18.4 and 18.5 in Sect. 18.4 ). The 38 low-level descriptors
(LLDs) of the IS-2010 feature set have been used as audio characteristics. The
relevance of the LLDs was analyzed with respect to the classes Non-Ov/Ov, which
are referred to as f N, O g , and the HLC-Ov/LLC-Ov, which are referred to as
f H, L g . For each LLD, the relevance is given by the information gain (Rauber and
Steiger-Garcao 1993 ), which is computed on the segments of 5-s duration with the
following formula: H (class) H (class/LLD), where H is the Shannon entropy. Four
steps were defined to compute the entropy: (1) filtering of the IS-2010 features
according to a given LLD, (2) clustering of the segments of the Train set using
the filtered features, (3) computation of the contingency table from the class and
the cluster associated with each segment, and (4) estimation of the entropy from the
table of contingency. Table 18.6 gives the information gain computed on the Train
set of the five best-ranked LLDs (over 38 LLDs) in discriminating LLC-Ovs and
HLC-Ovs. The most relevant LLDs are the logarithmic powers of mel-frequency
bands and, more precisely, the high-frequency bands and the normalized loudness.
These results show that various acoustic differences exist between the two types of
overlaps.
 
Search WWH ::




Custom Search