Identifying Saxophonists from Their Playing Styles - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

received little attention in the past. This is mainly

due to two factors: (a) the high complexity of

the feature extraction process that is required to

characterize expressive performance, and (b) the

question of how to use the information provided

by an expressive performance model for the task

of performance-based interpreter identification.

To the best of our knowledge, the only group

working on performance-based automatic inter-

preter identification is the group led by Gerhard

Widmer. Saunders, Hardoon, Shawe-Taylor, and

Widmer (2004) apply string kernels to the problem

of recognizing famous pianists from their playing

style. The characteristics of performers playing

the same piece are obtained from changes in beat-

level tempo and beat-level loudness. From such

characteristics, general performance alphabets

can be derived, and pianists' performances can

then be represented as strings. They apply both

kernel partial least squares and Support Vector

Machines to this data.

Stamatatos and Widmer (2005) address the

problem of identifying the most likely music per-

former, given a set of performances of the same

piece by a number of skilled candidate pianists.

They propose a set of very simple features for

representing stylistic characteristics of a music

performer that relate to a kind of “average” per-

formance. A database of piano performances of 22

pianists playing two pieces by Frédéric Chopin is

used. They propose an ensemble of simple classi-

fiers derived by both subsampling the training set

and subsampling the input features. Experiments

show that the proposed features are able to quantify

the differences between music performers.

ing techniques to these extracted features. This

is, our interest is to obtain for each performed

note a set of perceptual features (e.g., timbre)

and a set of contextual features (e.g., neighbor-

ing notes pitch) from the audio recording. Thus,

descriptors providing perceptual and contextual

information about the performed notes are of

particular interest.

extraction of contextual features

Figure 1 represents the steps that are performed

to obtain a melodic description from audio. First

of all, we perform a spectral analysis of a por-

tion of sound, called analysis frame, whose size

is a parameter of the algorithm. This spectral

analysis lies in multiplying the audio frame with

an appropriate analysis window and performing

a Discrete Fourier Transform (DFT) to obtain its

spectrum. In this case, we use a frame width of

46 ms, an overlap factor of 50%, and a Keiser-

Bessel 25dB window. Then, we compute a set of

low-level descriptors for each spectrum: energy

and an estimation of the fundamental frequency.

From these low-level descriptors we perform

a note segmentation procedure. Once the note

Figure 1. Block diagram of the melody descriptor

A u d io s ig n a l

S p e c tra l a n a ly s is

L o w -le v e l fe a tu re

e x tra c tio n

N o te

s e g m e n ta tio n

N o te d e s c rip to rs

c o m p u ta tio n

melodIc descrIptIon

In tra n o te

s e g m e n ta tio n

In this section, we outline how we extract a de-

scription of a performed melody for monophonic

recordings. We use this melodic representation to

provide a contextual and perceptual description

of the performances and apply machine learn-

In tra -n o te s e g m e n t

d e s c rip to rs

c o m p u ta tio n

M e lo d ic

d e s c rip tio n

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home