Digital Signal Processing Reference
In-Depth Information
Table 12.1
Number of instances, as well as min(imum), mean, max(imum), and total recording
length (
) of the audio files by the biological class of the species in the HU-ASA database
(Biological) Class # Instances
Duration [s]
Min
Mean
Max
Sum
Aves
868
2.4
14.8
64.7
12 210
Mammalia
487
1.0
14.7
37.7
6 954
Amphibia
27
1.8
19.6
65.9
529
Reptilia
7
11.2
22.5
39.6
157
Insecta
19
2.3
16.0
30.1
287
Other
10
133
Sum
1 418
20 423
Table 12.2 Distribution of
instances in the 2-class
( Passeriformes / Non-
Passeriformes ) and 5-class
tasksasdefinedonthe
HU-ASA database
Class
# Instances
Passeriformes
282
Non-Passeriformes
586
Sum
868
Primates
90
Canidae
43
Felidae
62
Sum
1 063
The more complex 5-class task adds mammals ( Mammalia ) of the families Felidae
and Canidae , as well as the instances of the biological order Primates (cf. Table 12.2 ).
A particular challenge arises from the real-world nature of the database: vocalisations
of the same species often vary considerably, depending on the situation and stance
(i.e., aggression or warning calls), and age of the animals, from young to full-grown.
The recordings are further corrupted by background noises—even of other animal
species.
12.1.2 Methodology
Static classification by SVMs bases on linear kernel SVM. For dynamic classification,
two topologies of HMMs and LSTM RNNs are compared. A typical HMM topology
in audio (and general sequence) classification is a linear (left-right) layout: With N as
the number of states in total, state transitions are allowed from state i
=
1
,...,
N
1
to states i and i
1. However, animal vocalisations are often highly repetitive,
motivating the usage of a cyclic topology. In such a layout a transition from state N
to the first state is added. In the following experiments the number of states is fixed
to N
+
=
8 basing on a series of evaluations.
 
 
Search WWH ::




Custom Search