Information Technology Reference
In-Depth Information
Figure 4. Energy envelope and its linear approximation of a real excerpt with intranote segment limits
marked
is defined as the segment between the start of
the most negative of the computed slopes and the
note offset. Sustain is restricted to the remaining
segment. When the end of attack and the start of
release limits of a note coincide, it is considered
that the note does not present sustain segment.
Intranote segment characterization. Once we
have found the intranote segment limits, we de-
scribe each one by its duration (absolute and rela-
tive to note duration), start and end times, initial
and final energy values (absolute and relative to
note maximum) and slope. For the stable part of
each note (sustain segment), we extract an aver-
aged spectral centroid and spectral tilt in order to
have timbral descriptors related to the brightness
of a particular execution. We compute the spectral
centroid as the frequency bin corresponding to
the barycenter of the spectrum, expressed as (5),
where fft is the fast fourier transform of a frame,
N is the size of the fast fourier tarnsform, and k
is the bin index. For the spectral tilt, we perform
a linear regression of the logarithmic spectral
envelope between 2kHz and 6kHz, and get the
slope expressed in dB/Hz.
performance-drIven
Interpreter IdentIfIcatIon
In this section, we describe our approach to the
problem of recognizing saxophonists from their
playing style. In particular, we introduce the dif-
ferent note descriptors we use to characterize the
internal and contextual note properties (computed
as described in the previous section), as well as
the different algorithms we apply to identify
interpreters from their playing style.
note descriptors
We characterize each performed note by the fol-
lowing two sets of features:
Perceptual (intranote) features. The percep -
tual features represent perceptual properties
of a note which are specified as intranote
characteristics of the audio signal. The set
of perceptual features we have included in
the research reported here are the note's at-
tack level, sustain duration, sustain slope,
amount of legato with the previous note,
amount of legato with the following note,
mean energy, spectral centroid and spectral
tilt. This is, each performed note is percep-
tually characterized by the tuple
N
k fft k
( )
SC
=
k
=
1
(5)
N
fft k
( )
k
=
1
 
Search WWH ::




Custom Search