Information Technology Reference
In-Depth Information
Table 3. Classification errors with respect to dif-
ferent learning schemes
Table 4. Classification according to user prefer-
ences
Classic/pop
Techno/pop
User 1
User 2
User 3
User 4
SVM (linear)
0 . 00%
6 . 88%
Accuracy
95 . 19%
92 . 14%
90 . 56%
84 . 55%
SVM (rbf)
1 . 50%
14 . 38%
Precision
92 . 70%
98 . 33%
90 . 83%
85 . 87%
C4.5
0 . 00%
7 . 50%
Recall
99 . 00%
84 . 67%
93 . 00%
83 . 74%
k-NN
3 . 00%
9 . 38%
Error
4 . 81%
7 . 86%
9 . 44%
15 . 45%
Naive Bayes
2 . 50%
10 . 63%
Techno/pop: 80 songs for each class from
a large variety of artists in Ogg Vorbis.
volume being the most frequently selected feature.
The decision tree classifying hiphop against pop
is rather complex. It starts with the length of the
songs. Experiments with naive Bayes and k-NN
did not change the picture: an accuracy of about
75% can easily be achieved, increasing the per-
formance further demands better features.
To demonstrate the effect of tailored feature
sets for each classification task we performed
experiments with the same feature set for all data
sets. We used only features which were used in
at least 50% of all subsets produced by feature
selection for all data sets to simulate a reasonable
standard feature set. Table 2 shows the classifi-
cation performance for a linear SVM estimated
with a 10-fold cross validation. The performance
is significantly lower than the performance which
can be achieved using the tailored feature sets
(see Table 1).
Table 3 shows the achieved classification errors
with respect to different learning schemes. Since
the extraction of features and the transformation
in another feature space is performed by the ap-
plied method tree, the usage of a linear kernel
function is actually no restriction. Therefore, we
use a linear SVM for all our experiments and as
inner learner to estimate the fitness of the method
trees. The conclusions which can be drawn from
Table 2 and 3 indicate that a tailored set of task
specific features and not the quality of the learn-
ing scheme is the crucial aspect for the successful
classification of audio data. More details on this
can be found in Mierswa and Morik (2005).
Hiphop/pop: 120 songs for each class from
few records were available in MP3 format
with a coding of 128 kbits/s.
The classification tasks are of increasing dif-
ficulty. Using mySVM with a linear kernel, the
performance was determined by a 10-fold cross
validation and is shown in Table 1. Concerning
classic vs. pop, 93% accuracy, and concerning
hiphop vs. pop, 66% accuracy have been published
(Tzanetakis, 2002; Tzanetakis et al., 2001).
41 features have been constructed for all genre
classification tasks (the full list is available in
(Mierswa, 2004). For the distinction between
classic and pop, 21 features have been selected
for mySVM by the evolutionary approach. Most
runs selected features referring to the phase space
(angle and variance). The use of features can also
be inspected by restricting a top-down induction
of decision trees to a few levels. For a one level
stump, 93% accuracy could be achieved by just
using the RMS volume, i.e. the root mean square
average of the series. For the separation of techno
and pop, 18 features were selected for mySVM, the
most frequently selected ones being the filtering
of those positions in the index dimension where
the curve crosses the zero line. The decision tree
starts with a phase space feature, the average of
angles. A one level stump uses the starting value
of the second frequency band, giving a benchmark
of 76% accuracy. For the classification into hiphop
and pop, 22 features were selected with the mere
Search WWH ::




Custom Search