Collaborative Use of Features in a Distributed System for the Organization of Music Collections - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

Table 3. Classification errors with respect to dif-

ferent learning schemes

Table 4. Classification according to user prefer-

ences

Classic/pop

Techno/pop

User 1

User 2

User 3

User 4

SVM (linear)

0 . 00%

6 . 88%

Accuracy

95 . 19%

92 . 14%

90 . 56%

84 . 55%

SVM (rbf)

1 . 50%

14 . 38%

Precision

92 . 70%

98 . 33%

90 . 83%

85 . 87%

C4.5

0 . 00%

7 . 50%

Recall

99 . 00%

84 . 67%

93 . 00%

83 . 74%

k-NN

3 . 00%

9 . 38%

Error

4 . 81%

7 . 86%

9 . 44%

15 . 45%

Naive Bayes

2 . 50%

10 . 63%

•

Techno/pop: 80 songs for each class from

a large variety of artists in Ogg Vorbis.

volume being the most frequently selected feature.

The decision tree classifying hiphop against pop

is rather complex. It starts with the length of the

songs. Experiments with naive Bayes and k-NN

did not change the picture: an accuracy of about

75% can easily be achieved, increasing the per-

formance further demands better features.

To demonstrate the effect of tailored feature

sets for each classification task we performed

experiments with the same feature set for all data

sets. We used only features which were used in

at least 50% of all subsets produced by feature

selection for all data sets to simulate a reasonable

standard feature set. Table 2 shows the classifi-

cation performance for a linear SVM estimated

with a 10-fold cross validation. The performance

is significantly lower than the performance which

can be achieved using the tailored feature sets

(see Table 1).

Table 3 shows the achieved classification errors

with respect to different learning schemes. Since

the extraction of features and the transformation

in another feature space is performed by the ap-

plied method tree, the usage of a linear kernel

function is actually no restriction. Therefore, we

use a linear SVM for all our experiments and as

inner learner to estimate the fitness of the method

trees. The conclusions which can be drawn from

Table 2 and 3 indicate that a tailored set of task

specific features and not the quality of the learn-

ing scheme is the crucial aspect for the successful

classification of audio data. More details on this

can be found in Mierswa and Morik (2005).

•

Hiphop/pop: 120 songs for each class from

few records were available in MP3 format

with a coding of 128 kbits/s.

The classification tasks are of increasing dif-

ficulty. Using mySVM with a linear kernel, the

performance was determined by a 10-fold cross

validation and is shown in Table 1. Concerning

classic vs. pop, 93% accuracy, and concerning

hiphop vs. pop, 66% accuracy have been published

(Tzanetakis, 2002; Tzanetakis et al., 2001).

41 features have been constructed for all genre

classification tasks (the full list is available in

(Mierswa, 2004). For the distinction between

classic and pop, 21 features have been selected

for mySVM by the evolutionary approach. Most

runs selected features referring to the phase space

(angle and variance). The use of features can also

be inspected by restricting a top-down induction

of decision trees to a few levels. For a one level

stump, 93% accuracy could be achieved by just

using the RMS volume, i.e. the root mean square

average of the series. For the separation of techno

and pop, 18 features were selected for mySVM, the

most frequently selected ones being the filtering

of those positions in the index dimension where

the curve crosses the zero line. The decision tree

starts with a phase space feature, the average of

angles. A one level stump uses the starting value

of the second frequency band, giving a benchmark

of 76% accuracy. For the classification into hiphop

and pop, 22 features were selected with the mere

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home