Applications in Intelligent Music Analysis - Intelligent Audio Analysis

Digital Signal Processing Reference

In-Depth Information

Rather than choosing music by artist or album one sometimes wishes for music

that 'fits the occasion' or one's mood, such as when jogging, relaxing, or perhaps

having dinner for two. Thus, tags such as 'activating', 'calming' or 'romantic' would

be of help in music retrieval [ 147 , 148 ]. Manual annotation by individual users

seems rather labour intensive, but some services exist that provide such tags such

as Allmusic, 7 often based on several users' ratings. Regrettably, this information is

not always reliable, as the tags are often only attached to artists rather than to single

tracks. This leads to the desire of automated mood classification of music. In this

section, we will thus have a look at audio features suited for this particular task,

and benchmark results reachable with state-of-the-art approaches under real-world

conditions—without pre-selection of instances, e.g., by limiting analysis to those

with majority agreement of annotators.

Features for mood recognition can be extracted from the raw audio stream, but

also from metadata. Those derived from the audio can be added by mid-level ones

basing on pre-classification. This means that, apart from the LLDs and functionals

as introduced in Sect. 11.6 , knowledge from other classification tasks such as the

ones introduced for music processing in this chapter can be used as mid-level feature

information describing concepts such as rhythm or tonal structure. Metadata on the

other hand includes all types of textual information available on a music track such

as title, artist, genre, year of release or lyrics.

In the literature so far some commonalities are visible: In [ 149 ] a 30 element

feature vector containing timbre, pitch, and rhythm information is used. The work

in [ 150 ] employs timbre features by spectrum centroid, bandwidth, roll off, and

spectral flux, and seven octave-interval sub-bands' minimum, maximum, and average

amplitude plus RMS energy. For rhythm information the lowest sub-band was used.

Edge detection with a Canny estimator led to a rhythm curve. In this curve peaks

are assumed to indicate bass instruments' onsets, and their strength as indication

for the degree of rhythm presence. Further, analysis by ACF serves as measure for

rhythm steadiness, and the common divisor of the correlation peaks for the tempo.

In [ 151 ] an extension is presented for rhythm analysis by addition of all sub-band

onset curves. The authors of [ 152 ] also use rhythm and timbre features: Two tempo

candidates in BPM are based on peaks in a beat histogram ACF. From this histogram

amplitude ratios and sum of its ranges are added. Timbre is based on 13 MFCCs

[ 153 ] and spectral centroid, flux, and roll off. Mean and standard deviation of the

features over all frames were also included. In [ 154 ]—a MIREX 2008 8 audio mood

classification task contribution—MFCC, CHROMA, and spectral crest and flatness

describe whether the signal spectrum contains peaks, e.g., in case of sinusoidal signals

or it is flat indicating noise.

The learning algorithms vary strongly for this task, just as the mood taxonomies

do (cf. Sect. 5.3.2 ) . In fact, the diverse mood models certainly influence the selection

of the learning algorithm. As an example, in [ 150 , 151 ] a four-class dimensional

model is handled by GMMs as basis for a hierarchical classification system (HCS):

7

Allmusic ( http://www.allmusic.com )

8

MIREX 2008 ( http://www.music-ir.org/mirex/2008 )

Intelligent Audio Analysis

Search WWH ::

Custom Search

Home