Information Technology Reference
In-Depth Information
In particular, systems which focus solely on
single-note music transcription systems have
progressed from early efforts such as those of
Kuhn (1990), through those built into early MIR
systems including Ghias, Logan, Chamberlin
and Smith (1995), Kageyama, Mochizuki and
Takashima (1993) Kageyama and Takashima
(1994) and McNab, Smith, Witten, Henderson and
Cunningham (1996), and finally to commercial
products such as Wildcat Canyon Software's Au-
toscore and Emagic's Logic (now sold by Apple).
Now that this technology has become sufficiently
mature to be viable in the commercial marketplace,
it is viable to build an MIR interface around it.
Transcription systems such as these can produce
accurate transcriptions of music much of the time,
but consistently perfect transcriptions are by no
means guaranteed. Many factors contribute to
sources of error that inevitably creep into the
resulting representations of acoustic input.
The individual notes produced by most in-
struments and the human voice are composed of
sounds which resonate at a number of frequencies.
The fundamental, or lowest, frequency is normally
the strongest and the one we label as the pitch of a
note. Besides the fundamental frequency, integral
multiples of that frequency, known as harmonics
or overtones, also appear but at lesser strengths
than the fundamental. The relative strengths of
the sounds centered at various harmonic frequen-
cies are a characteristic of the instrument or voice
making the sound. Occasionally, for a given
note or some short subset of the duration of that
note, one of the harmonic frequencies (normally
the first harmonic) is identified incorrectly as
the fundamental frequency. The note-tracking
software interprets this as a jump in pitch of
exactly one octave, as the first harmonic is twice
the frequency of the true fundamental.
It is not uncommon for a person or an out-
of-tune instrument to produce notes that are flat
or sharp relative to the nearest key signature. A
straightforward music transcription algorithm
might use a strict cutoff halfway between two
semitones, so that a note below the threshold is
mapped to the nearest lower semitone and vice
versa. A rendition whose notes occasionally cross
this critical point would thus be recognized as if
the user switched keys during the course of the
input, perhaps many times back and forth.
There are a number of reasonable approaches
to dealing with the tuning problem. If a represen-
tation scheme is designed to ignore small pitch
differences, as may be reasonable with imprecise
vocal input, it would be unaffected by the one-
semitone differences we have discussed, and sim-
ply ignoring these anomalies is efficient and effec-
tive. Alternatively, an effective matching scheme
may require to have the pitches of notes reported
in more precise increments, as was suggested by
Lindsay (1996) and eventually included in the
MPEG-7 Melody Sequence DS representation
(Gomez, Gouyon, Herrera & Amatriain, 2003).
Another attempted remedy to this problem was
the adaptive tuning method proposed by McNab
et al. (1996) and implemented as part of their
MELDEX system (McNab, Smith, Bainbridge
& Witten, 1997): If the user's hummed note is
sharp or flat relative to the Western tonal music
scale, the algorithm adjusts its internal scale by an
appropriate amount so that the note that follows
will be assigned a pitch value relative to tuning
of the prior note rather than by comparing to the
absolute tonal scale. Unfortunately, they did not
report if any formal testing had been performed
to determine whether or not this method enables
user input to be encoded more accurately, and Haus
and Pollastri (2001) later found that this method
actually decreased the accuracy of transcriptions
compared to more naive algorithms. Autoscore
incorporates as an option a simpler implementa-
tion of adaptive tuning which is performed only
on the first note of an input phrase, so that all
subsequent notes are assigned pitch values taking
into account the degree to which the first note was
out of tune. A more recent music transcription
system produced by Haus and Pollastri (2001)
utilizes some new techniques in identifying rela-
Search WWH ::




Custom Search