Information Technology Reference
In-Depth Information
psycho-acoustical knowledge (Klapuri, 1999).
In a second step, fundamental frequency transi-
tions are also detected. Finally, both results are
merged to find the note boundaries (onset and
offset information).
corresponding to each note. This study is carried
out by analyzing the envelope curvature and
characterizing its shape, in order to estimate the
limits of the intranote segments.
When observing the note energy envelopes
from the saxophone recordings, we identify that
there are usually three segments (attack, sustain
and release (Bernstein & Cooper, 1976)) needed to
conform a description that fits the model schemati-
cally represented in Figure 2. We discarded the
decay segment due to the general characteristics
of the notes within the performances.
In order to extract these three characteristic
segments, we study the smoothed derivatives in
a similar way that presented in (Jenssen, 1999),
where partial amplitude envelopes are modeled
for isolated sounds. The main difference is that
we analyze the notes in their musical context,
rather than isolated. In addition, only three linear
segments are considered. Moreover, instead of
studying the contribution of all the partials, we
obtain general intensity information from the total
energy envelope characteristic. The procedure is
carried out as follows.
Considering the energy envelope as a dif-
ferentiable function over time, the points of
maximum curvature can be considered as the
local maximum variations of the first derivative
of the signal energy (second derivative extremes),
that is, the local maxima or minima of the second
derivative.
Due to the characteristics of the audio signal,
the energy envelope must be previously smoothed
by low-pass filtering, since there are typically
too many second derivative extremes. Several
smoothing steps are carried out in order to find
a good cut-off frequency of the smoothing filter.
The smoothed envelope should not differ much to
the original one to avoid loss of localization due to
the filtering effect. Thus, for each smoothing step,
the error e m at smoothing step m between original
and current envelope is computed. This is carried
out by means of (1), where N is the length of the
envelope in frames, env is the original envelope
Note Descriptors
We compute note descriptors using the note
boundaries and the low-level descriptors values.
The low-level descriptors associated to a note
segment are computed by averaging the frame
values within this note segment. Pitch histograms
have been used to compute the pitch note and the
fundamental frequency that represents each note
segment, as found in (McNab, Smith, & Witten,
1996). This is done to avoid taking into account-
mistaken frames in the fundamental frequency
mean computation. First, frequency values are
converted into cents, by the following formula:
f
fref
log(
)
c =
1200
log 2
where f ref = 8.176. Then, we define histograms
with bins of 100 cents and hop size of 5 cents
and we compute the maximum of the histogram
to identify the note pitch. Finally, we compute the
frequency mean for all the points that belong to
the histogram. The MIDI pitch is computed by
quantization of this fundamental frequency mean
over the frames within the note limits.
extraction of perceptual (Intranote)
features
Once we segment the audio signal into notes, we
perform a characterization of each of the notes in
terms of its internal features.
Intranote segmentation. The proposed intra-
note segmentation method is based on the study
of the energy envelope contour of the note. Once
onsets and offsets are located, we study the in-
stantaneous energy values of the analysis frames
 
Search WWH ::




Custom Search