Digital Signal Processing Reference
In-Depth Information
·
combined expecting synergies. Looking at line “2
F1” in Table 12.6 , up-sampling
improves over the baseline setting in four out of six cases. Table 12.6 also shows
detailed results for the case up-sampling by copying (2
·
F1) and confidences higher
than 0.7.
Looking again at UA values in Table 12.7 , as one would expect, the best average
result is obtained using the original labels and data of fold one and fold two for
training (66.5 % UA). Then, semi-supervised learning significantly (one-sided z-test,
p
0.05) boosts the performance of sound event classification by an increase in UA
of 2 % absolute over not using fold two data at all. This boost is almost half the
one achieved by supervised training (5.4 %) with all data over only using fold one.
The nature class being the most sparse one, benefited most from semi-supervised
learning. This effectively demonstrates the potential gain of semi-supervised learning
for exploitation of unlabelled audio data.
<
12.2.3 Summary
The potential of semi-supervised learning on a large scale AEC task was investigated.
In the result, adding unlabelled data with high classifier confidence level to the
human-labelled training data can enhance recognition performance. Up-sampling
of originally labelled data and iterating the semi-supervised learning process both
boosted classification accuracy in the experiments by emphasising on originally
labelled data. Combining both strategies gradually increases the advantage of semi-
supervised learning. As one would expect, accuracy of semi-supervised learning is
below the gain that can be expected when adding labelled data of the same amount.
Yet, given the considerable efforts and costs involved in human labelling of thousands
of instances and the large amounts of sound event data publicly available makes
consideration of semi-supervised learning a promising approach in future machine-
based sound analysis.
Future efforts could continue to focus on agglomeration of huge amounts of unla-
belled sound event data and its application in analysis of real-life sound streams—
ideally in combination with blind audio source separation.
12.3 Emotion
Similarly to the analysis of speech and music, where we first looked at 'what' was
being said or played before looking at the affective side of speech and music, one
can also attempt to automatically predict the emotion a sound event is likely to evoke
in a listener. This will be the last application example presented in this topic. It was
first introduced in [ 12 ].
In fact, literature on emotion recognition from the acoustic channel—be it the
emotion a listener thinks is contained or that she or he feels when listening—, is
 
Search WWH ::




Custom Search