Information Technology Reference
In-Depth Information
as “same” when the two duration values were
within 25% of one another. Based on the results
of our first experiment, we additionally applied
our duration contour scheme to INOT values
rather than raw duration values as reported by
our transcriber.
We found that these contour representations
had even worse matching discrimination than the
solely pitch-based ones. INOT-based contours
using n-gram representation showed in the best
case that the intended target song could be placed
only in the top 33% of our 300-song database
on average with our MUSI group input queries.
Utilizing INOT values instead of note durations
improved the overall results by approximately
10% using our search methods.
We further developed many other variants
combining both pitch and INOT information in
various ways to improve search accuracy, but none
of these efforts proved fruitful in the end. Our
conclusion was that contour-based representations
are simply unlikely to work for MIR systems
in the presence of input query errors caused by
non-expert vocal input. We then turned to more
computationally complex strategies, finding suc-
cess with approximate matching algorithms based
on the work of Smith and Waterman (1981) and
also used by other MIR researchers.
used its output to build a musical database search
mechanism in 1995, using an even smaller test
database. Their system performed approximate
matching searches using ternary pitch contour
information. One of the ideas proposed by the
authors to improve searching was to increase the
resolution of the melodic contour data, so that a
note transition is labeled not merely as increasing
or decreasing, but some sense of magnitude is
given as well (slightly increasing vs. significantly
increasing). However, they did not perform any
experiments to test the feasibility of such a rep-
resentation or the limits of its resolution (i.e., how
many distinct subdivisions of “increasing” can
be used and where their boundaries lie). As we
have shown, this kind of representation, which
also forms the basis of the relatively new MPEG-7
Melody Contour description, will certainly not be
effective for input queries with errors due to the
combined effects of nonmusician humming skills
and automated transcription errors.
The MELDEX system mentioned earlier
became part of the New Zealand Digital Library
project (Bainbridge, Nevill-Manning, Witten,
Smith & McNab, 1999), which is currently avail-
able online (University of Waikato, n.d.). A
Web-based interface allows for hummed input
queries, and they use approximate matching
algorithms incorporating pitch contour, pitch
intervals, and/or rhythm information. Previous
versions of the system allowed the user control
over certain aspects of their search algorithm, but
at the time of this writing, the current system has
removed these options.
Hu and Dannenberg (2002) presented a series
of approximate matching algorithms that took into
account various input features of hummed input
queries. They explored a promising technique in
which no attempt is made to differentiate indi-
vidual notes in the hummed input; instead, pitch
values are reported in 100ms frames and then
scaled to many different tempos and keys to run
the matching algorithm many times. Preliminary
approximate matching mIr
Kageyama et al. (1993, 1994) was one of the first
groups to implement a voice transcription system
and an integrated search mechanism with a 500-
song test database in 1993. Hummed notes were
translated into a scale-step (semitone) representa-
tion which was then subjected to an approximate
pattern matching algorithm in order to locate a
match between the transcribed input phrase and a
song in the database. The accuracy of the system
was reported to be fair, normally requiring 12 or
more notes to find a match when testing against
a database of 500 songs. Ghias et al. (1995) also
created their own music transcription system and
Search WWH ::




Custom Search