Melodic Query Input for Music Information Retrieval Systems - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

In particular, systems which focus solely on

single-note music transcription systems have

progressed from early efforts such as those of

Kuhn (1990), through those built into early MIR

systems including Ghias, Logan, Chamberlin

and Smith (1995), Kageyama, Mochizuki and

Takashima (1993) Kageyama and Takashima

(1994) and McNab, Smith, Witten, Henderson and

Cunningham (1996), and finally to commercial

products such as Wildcat Canyon Software's Au-

toscore and Emagic's Logic (now sold by Apple).

Now that this technology has become sufficiently

mature to be viable in the commercial marketplace,

it is viable to build an MIR interface around it.

Transcription systems such as these can produce

accurate transcriptions of music much of the time,

but consistently perfect transcriptions are by no

means guaranteed. Many factors contribute to

sources of error that inevitably creep into the

resulting representations of acoustic input.

The individual notes produced by most in-

struments and the human voice are composed of

sounds which resonate at a number of frequencies.

The fundamental, or lowest, frequency is normally

the strongest and the one we label as the pitch of a

note. Besides the fundamental frequency, integral

multiples of that frequency, known as harmonics

or overtones, also appear but at lesser strengths

than the fundamental. The relative strengths of

the sounds centered at various harmonic frequen-

cies are a characteristic of the instrument or voice

making the sound. Occasionally, for a given

note or some short subset of the duration of that

note, one of the harmonic frequencies (normally

the first harmonic) is identified incorrectly as

the fundamental frequency. The note-tracking

software interprets this as a jump in pitch of

exactly one octave, as the first harmonic is twice

the frequency of the true fundamental.

It is not uncommon for a person or an out-

of-tune instrument to produce notes that are flat

or sharp relative to the nearest key signature. A

straightforward music transcription algorithm

might use a strict cutoff halfway between two

semitones, so that a note below the threshold is

mapped to the nearest lower semitone and vice

versa. A rendition whose notes occasionally cross

this critical point would thus be recognized as if

the user switched keys during the course of the

input, perhaps many times back and forth.

There are a number of reasonable approaches

to dealing with the tuning problem. If a represen-

tation scheme is designed to ignore small pitch

differences, as may be reasonable with imprecise

vocal input, it would be unaffected by the one-

semitone differences we have discussed, and sim-

ply ignoring these anomalies is efficient and effec-

tive. Alternatively, an effective matching scheme

may require to have the pitches of notes reported

in more precise increments, as was suggested by

Lindsay (1996) and eventually included in the

MPEG-7 Melody Sequence DS representation

(Gomez, Gouyon, Herrera & Amatriain, 2003).

Another attempted remedy to this problem was

the adaptive tuning method proposed by McNab

et al. (1996) and implemented as part of their

MELDEX system (McNab, Smith, Bainbridge

& Witten, 1997): If the user's hummed note is

sharp or flat relative to the Western tonal music

scale, the algorithm adjusts its internal scale by an

appropriate amount so that the note that follows

will be assigned a pitch value relative to tuning

of the prior note rather than by comparing to the

absolute tonal scale. Unfortunately, they did not

report if any formal testing had been performed

to determine whether or not this method enables

user input to be encoded more accurately, and Haus

and Pollastri (2001) later found that this method

actually decreased the accuracy of transcriptions

compared to more naive algorithms. Autoscore

incorporates as an option a simpler implementa-

tion of adaptive tuning which is performed only

on the first note of an input phrase, so that all

subsequent notes are assigned pitch values taking

into account the degree to which the first note was

out of tune. A more recent music transcription

system produced by Haus and Pollastri (2001)

utilizes some new techniques in identifying rela-

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home