Information Technology Reference
In-Depth Information
tive pitch errors in a user's hummed input over
the entire length of the input phrase in order to
adjust the tuning. They performed a limited study
using five subjects, running a recording of the
volunteers' hummed musical phrases through
four different pitch transcription algorithms; their
transcriber was shown to produce significantly
more accurate results as compared to the other
methods described earlier.
In addition to the issues of correctly identify-
ing the fundamental frequency and of processing
notes which are out of tune, a third class of errors
has at its source the mechanisms used in note-
tracking algorithms to determine note boundaries,
particularly for successive notes having the same
pitch. In sounding a long note, a user might change
volume or shift position relative to the microphone
over time, causing dropouts to occur which result
in the single performed note being reported as
several shorter notes. Conversely, a sequence of
short notes intoned with little or no gaps between
them might be reported with some of the notes
grouped together incorrectly. The frequency and
severity of these types of errors also depend on
the environment in which the system is being used
(e.g., the quality of the microphone used and the
background noise it may pick up along with the
intended vocal input), which may magnify the
effects described here. Many systems such as
Autoscore attempt to overcome the problems of
note transition boundaries by requiring the user to
intone each note with a consonant sound such as
“da” rather than strict humming. Other researchers
such as Hu and Dannenberg (2002) and Zhu and
Shasha (2003) have created systems which do not
rely on being able to correctly identify individual
note boundaries, considering the time-series of
successive pitch values directly and using different
techniques to form representations appropriate
for their respective search techniques.
Thus, the current state of automatic transcrip-
tion of single-voice music is quite usable yet still
susceptible to errors from a variety of sources.
We can expect that the reliability and accuracy
of automated transcription will continue to im-
prove over time, but the issues of ambient noise,
microphone positioning, and input quality will
remain, and along with them, the errors they in-
troduce into note sequences are likely to remain
with us as well.
surveY of relevant studIes
on human hummIng skIlls
Aside from all of the sources of error possible
from the process of recording and transcribing a
person's vocal musical input, there remains the
issue of the mistakes made by the person singing
or humming with respect to the intended tune or
musical phrase. In order to build an interface that
can extract reliable information out of a typical
user's singing or humming, we must first under-
stand the processes by which humans perceive,
recognize and remember musical information. A
person's ability to reproduce music vocally rests
squarely upon this foundation. We present here
a number of studies in the areas of psychology
and music cognition, most of which focused on
the problems of music perception and recognition
rather than reproduction. Through this exposi-
tion we will derive the motivations and sources
of influence for our own experiments in music
reproduction, which drew from the insights and
techniques of several of these prior studies. From
the collective results of these experiments, we will
present data and conclusions which contribute
to a more complete model of human humming
ability.
music perception and recognition
The way in which people remember both familiar
and novel melodies clearly has a very strong influ-
ence on the way they will attempt to reproduce a
tune that they have heard.
Along with several different contributors over
the years, Dowling performed several studies
Search WWH ::




Custom Search