Information Technology Reference
In-Depth Information
Genre Extraction
signals can be classified into a hierarchy of music
genres, augmented with speech categories. The
speech categories are useful for radio and televi-
sion broadcasts.
Both whole-file classification and real-time
frame classification schemes are proposed. Cook
(2002) identifies and reviews two different ap-
proaches to Automatic Musical Genre Classifica-
tion. The first approach is prescriptive, as it tries
to classify songs in an arbitrary taxonomy, given
a priori. The second approach adopts a reversed
point-of-view, in which the classification emerges
from the songs.
Musical genres are categorical descriptions that
are used to characterize music in music stores,
radio stations, and now on the Internet. Although
the division of music into genres is somewhat
subjective and arbitrary, there are perceptual
criteria related to the texture, instrumentation,
and rhythmic structure of music that can be
used to characterize a particular genre. Humans
are remarkably good at genre classification as
investigated in Pachet and Cazaly (2000) where
it is shown that humans can accurately predict
a musical genre based on 250 milliseconds of
audio. This finding suggests that humans can
judge genres using only the musical surface
without constructing any higher level theoretic
descriptions as has been argued in Davis and
Mermelstein (1980). Up to now genre classification
for digitally available music has been performed
manually. Therefore techniques for automatic
genre classification would be a valuable addition
to the development of audio information retrieval
systems for music.
Cook (2002) addresses the problem of auto-
matically classifying audio signals into a hierar-
chy of musical genres. More specifically, three
sets of features for representing timbral texture,
rhythmic content, and pitch content are proposed.
Although there has been significant work in the
development of features for speech recognition
and music-speech discrimination, there has
been relatively little work in the development of
features specifically designed for music signals.
Although the timbral texture feature set is based
on features used for speech and general sound
classification, the other two feature sets (rhyth-
mic and pitch content) are new and specifically
designed to represent aspects of musical content
(rhythm and harmony). The performance and
relative importance of the proposed feature sets is
evaluated by training statistical pattern recogni-
tion classifiers using audio collections collected
from compact disks, radio, and the Web. Audio
Prescriptive approach: it makes the same
assumption that a genre taxonomy is given
and should be superimposed on the database
of songs. They all proceed in two steps:
Frame-based feature extraction: the
music signal is cut into frames, and a
feature vector of low-level descrip-
tors of timbre, rhythm, and so forth is
computed for each frame.
Machine learning/classification: a
classification algorithm is then applied
on the set of feature vectors to label
each frame with its most probable class:
its “genre.” The class models used in
this phase are trained beforehand, in
a supervised way.
The features used in the first step of automatic,
prescriptive genre classification systems can be
classified in three sets: timbre related, rhythm
related, and pitch related.
There have been numerous attempts at extract-
ing genre information automatically from the
audio signal, using signal processing techniques
and machine learning schemes.
Similarity relations approach: the second
approach to automatic genre classification
is exactly opposite to the prescriptive ap-
proach just reviewed. Instead of assuming
that a genre taxonomy is given a priori, it
Search WWH ::




Custom Search