Identifying Saxophonists from Their Playing Styles - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

psycho-acoustical knowledge (Klapuri, 1999).

In a second step, fundamental frequency transi-

tions are also detected. Finally, both results are

merged to find the note boundaries (onset and

offset information).

corresponding to each note. This study is carried

out by analyzing the envelope curvature and

characterizing its shape, in order to estimate the

limits of the intranote segments.

When observing the note energy envelopes

from the saxophone recordings, we identify that

there are usually three segments (attack, sustain

and release (Bernstein & Cooper, 1976)) needed to

conform a description that fits the model schemati-

cally represented in Figure 2. We discarded the

decay segment due to the general characteristics

of the notes within the performances.

In order to extract these three characteristic

segments, we study the smoothed derivatives in

a similar way that presented in (Jenssen, 1999),

where partial amplitude envelopes are modeled

for isolated sounds. The main difference is that

we analyze the notes in their musical context,

rather than isolated. In addition, only three linear

segments are considered. Moreover, instead of

studying the contribution of all the partials, we

obtain general intensity information from the total

energy envelope characteristic. The procedure is

carried out as follows.

Considering the energy envelope as a dif-

ferentiable function over time, the points of

maximum curvature can be considered as the

local maximum variations of the first derivative

of the signal energy (second derivative extremes),

that is, the local maxima or minima of the second

derivative.

Due to the characteristics of the audio signal,

the energy envelope must be previously smoothed

by low-pass filtering, since there are typically

too many second derivative extremes. Several

smoothing steps are carried out in order to find

a good cut-off frequency of the smoothing filter.

The smoothed envelope should not differ much to

the original one to avoid loss of localization due to

the filtering effect. Thus, for each smoothing step,

the error e m at smoothing step m between original

and current envelope is computed. This is carried

out by means of (1), where N is the length of the

envelope in frames, env is the original envelope

Note Descriptors

We compute note descriptors using the note

boundaries and the low-level descriptors values.

The low-level descriptors associated to a note

segment are computed by averaging the frame

values within this note segment. Pitch histograms

have been used to compute the pitch note and the

fundamental frequency that represents each note

segment, as found in (McNab, Smith, & Witten,

1996). This is done to avoid taking into account-

mistaken frames in the fundamental frequency

mean computation. First, frequency values are

converted into cents, by the following formula:

fref

log(

)

c =

1200

⋅

log 2

where f ref = 8.176. Then, we define histograms

with bins of 100 cents and hop size of 5 cents

and we compute the maximum of the histogram

to identify the note pitch. Finally, we compute the

frequency mean for all the points that belong to

the histogram. The MIDI pitch is computed by

quantization of this fundamental frequency mean

over the frames within the note limits.

extraction of perceptual (Intranote)

features

Once we segment the audio signal into notes, we

perform a characterization of each of the notes in

terms of its internal features.

Intranote segmentation. The proposed intra-

note segmentation method is based on the study

of the energy envelope contour of the note. Once

onsets and offsets are located, we study the in-

stantaneous energy values of the analysis frames

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home