Digital Signal Processing Reference
In-Depth Information
Table 14.2 continued
Task
#Classes Database
#Train
#Test
Test method UA [%] WA [%]
Music classes
Metre
2
BRD
1855
1855
SCV
97.1
96.6
Dance style
9
BRD
1855
1855
SCV
88.9
89.1
Tempo
141
BRD
1855
1855
SCV
89.0
88.5
Key
12
KEY-ALL
521
521
SCV
-
77.3
Key
24
KEY-ALL
521
521
SCV
-
62.1
Chords
24
ChoRD
10702
10702 LOSO
-
60.13
Chords
36
ChoRD
10702
10702 LOSO
-
48.84
Mood-arousal
3
NTWICM 1376
1272
T/D/T
56.2
58.7
Mood-valence
3
NTWICM 1376
1272
T/D/T
61.2
61.0
Voice presence
2
UltarStar
326527
97144 T/D/T
75.77
75.81
Singer age
2
UltraStar
315043
93342 T/D/T
57.55
56.56
Singer gender
2
UltraStar
326198
96373 T/D/T
89.61
93.60
Singer height
2
UltraStar
280714
80962 T/D/T
72.07
78.26
Singer race
2
UltraStar
321178
96563 T/D/T
63.30
76.98
Sound classes
Birds 2 HU-ASA 868 868 SCV 80.0 81.3
Animals 5 HU-ASA 1063 1063 SCV 49.5 64.0
Acoustic events 7 FindSounds 11292 5645 T/D/T 66.5 71.7
Given are the number of classes, the database with training (uniting training and development
instances) and test instances and the test method, where T/(D/)T are train, (develop), and test and
SCV is always ten-fold. The level of precision of the presented results depends on the number of
test instances
Achallenge remaining at that point will be the careful evaluation of ethical issues if
machines can listen to and understand arbitrary audio including personal information
and details.
Finally, given such holistic analysis capability basing on very efficient source sep-
aration and synergistic coupling of tasks, future audio analysis systems can start to
train themselves in a massive way such as by crawling the Internet for audio, or lis-
tening to very general media broadcast potentially reaching supra-human capabilities
in some of the alluded tasks.
Reference
1. Schiel, F.: Perception of alcoholic intoxication in speech. In Proceedings of Interspeech,
pp. 3281-3284. Florence (2011)
 
 
Search WWH ::




Custom Search