Information Technology Reference
In-Depth Information
18.4.1
Clip Segmentation and Relabeling
The clips were segmented into consecutive audio segments. Three segment dura-
tions were chosen for the multi-resolution: 1, 2, and 5 s. For a given duration of
segment, two segment-based detectors were designed: (1) the first detector is a two-
class classifier that is referred to as an f N, O g -detector; it classifies a segment into
Non-Ov (N) or Ov (O), and (2) the second detector is a three-class classifier that is
referred to as an f N, L, H g -detector, which classifies a segment into Non-Ov (N),
LLC-Ov (L), or HLC-Ov (H). Then, for multi-resolution detection, six SVM-based
overlap detectors have been developed: (1) three two-class SVM classifiers, which
we called f N, O g _1, f N, O g _2, and f N, O g _5, for the three durations, and (2) three
three-class SVM classifiers, which we called f N, L, H g _1, f N, L, H g _2, and f N, L,
H g _5, for the three durations. These labels (N, O, H, and L) were computed from
the SSPNet corpus metadata using speaker segmentation and conflict metadata. The
Train and Development sets were relabeled using the multi-resolution framework
of overlap localization. For each clip, diarization and conflict information are now
represented by 102 labels: 60 labels for f N, O g _1 and f N, L, H g _1, 30 labels for
f N, O g _2 and f N, L, H g _2, and 12 labels for f N, O g _5 and f N, L, H g _5. These new
labels will be used for the training and testing of the various overlap detectors.
In Figs. 18.5 and 18.6 , the row called Time gives the time in seconds in the range
from 1 to 30 (i.e., the clip duration), and the row Segmentation is the representation
of the diarization metadata of the clip: N-segments are colored in white, L-segments
Fig. 18.5
Train set relabeling for the Train_0001 clip of low-level conflict
Fig. 18.6
Train set relabeling for the Train_0006 clip of high-level conflict
Search WWH ::




Custom Search