Information Technology Reference
In-Depth Information
received little attention in the past. This is mainly
due to two factors: (a) the high complexity of
the feature extraction process that is required to
characterize expressive performance, and (b) the
question of how to use the information provided
by an expressive performance model for the task
of performance-based interpreter identification.
To the best of our knowledge, the only group
working on performance-based automatic inter-
preter identification is the group led by Gerhard
Widmer. Saunders, Hardoon, Shawe-Taylor, and
Widmer (2004) apply string kernels to the problem
of recognizing famous pianists from their playing
style. The characteristics of performers playing
the same piece are obtained from changes in beat-
level tempo and beat-level loudness. From such
characteristics, general performance alphabets
can be derived, and pianists' performances can
then be represented as strings. They apply both
kernel partial least squares and Support Vector
Machines to this data.
Stamatatos and Widmer (2005) address the
problem of identifying the most likely music per-
former, given a set of performances of the same
piece by a number of skilled candidate pianists.
They propose a set of very simple features for
representing stylistic characteristics of a music
performer that relate to a kind of “average” per-
formance. A database of piano performances of 22
pianists playing two pieces by Frédéric Chopin is
used. They propose an ensemble of simple classi-
fiers derived by both subsampling the training set
and subsampling the input features. Experiments
show that the proposed features are able to quantify
the differences between music performers.
ing techniques to these extracted features. This
is, our interest is to obtain for each performed
note a set of perceptual features (e.g., timbre)
and a set of contextual features (e.g., neighbor-
ing notes pitch) from the audio recording. Thus,
descriptors providing perceptual and contextual
information about the performed notes are of
particular interest.
extraction of contextual features
Figure 1 represents the steps that are performed
to obtain a melodic description from audio. First
of all, we perform a spectral analysis of a por-
tion of sound, called analysis frame, whose size
is a parameter of the algorithm. This spectral
analysis lies in multiplying the audio frame with
an appropriate analysis window and performing
a Discrete Fourier Transform (DFT) to obtain its
spectrum. In this case, we use a frame width of
46 ms, an overlap factor of 50%, and a Keiser-
Bessel 25dB window. Then, we compute a set of
low-level descriptors for each spectrum: energy
and an estimation of the fundamental frequency.
From these low-level descriptors we perform
a note segmentation procedure. Once the note
Figure 1. Block diagram of the melody descriptor
A u d io s ig n a l
S p e c tra l a n a ly s is
L o w -le v e l fe a tu re
e x tra c tio n
N o te
s e g m e n ta tio n
N o te d e s c rip to rs
c o m p u ta tio n
melodIc descrIptIon
In tra n o te
s e g m e n ta tio n
In this section, we outline how we extract a de-
scription of a performed melody for monophonic
recordings. We use this melodic representation to
provide a contextual and perceptual description
of the performances and apply machine learn-
In tra -n o te s e g m e n t
d e s c rip to rs
c o m p u ta tio n
M e lo d ic
d e s c rip tio n
Search WWH ::




Custom Search