Digital Signal Processing Reference
In-Depth Information
(a)
(b)
Fig. 12.1 UA and WA on the HU-ASA database by 8-state HMMs with left-right and cyclic
topologies, depending on the number of mixtures per state. Solid line: WA, dashed line: UA [ 10 ]
a left-right HMM, 2-class task, b cyclic HMM, 2-class task
for the HMMs is selectively shown in Fig. 12.1 for the 2-class task. Interestingly,
the cyclic HMM performs better than the left-right HMM for a small numbers of
mixtures. Further, the UA on the 5-class task seems to be largely unaffected by the
number of mixtures. This is surprising given that, ML classification partially compen-
sates for the unequal class distribution. LSTM RNNs outperform—not significantly
( p
5 %)—the HMMs on the 2-class task. Yet, they have the lowest UA for the
5-class task. Additional variation of the network layout may change this behaviour.
However, the lower performance for the 5-class is likely partly owing to the sparse-
ness of the non-bird classes as LSTM RNN have a comparably high demand of
training data.
>
12.1.4 Summary
In this section, an evaluation framework was shown for a challenging real-world
database of animal vocalisations. The performances of static and dynamic classifiers,
including LSTM networks, were compared. Dynamic classification provided higher
accuracy. In the comparison of 'standard' MFCC features with an enhanced feature
set containing pitch and voicing information no clear preference could be determined.
Further evaluations in this direction are needed to reveal the relevance of different
LLD and functional types for the classification of animal vocalisations.
From a classifier point of view, a hierarchical classification framework, e.g., by
combining the songbird / non-songbird classifier with a bird song recogniser could
be attempted.
 
 
Search WWH ::




Custom Search