Information Technology Reference
In-Depth Information
replacing this with their own transcriber (Kosugi,
personal correspondence, 2001). To our knowl-
edge, this was the first published system designed
to make significant use of duration information
as an integral part of its matching algorithms by
requiring users to select a metronome tempo before
humming and creating beat-based representations
of the input query note durations. Once the user
input was recorded and encoded in this fashion,
it was processed to create a series of feature vec-
tors used for search and matching. Their newest
SoundCompass version eliminates the need for
the metronome. In 2002, Kosugi kindly shared
with us 16 of his group's input query test samples
and their system's matching results so we could
compare the performance of REPRED. We found
that REPRED correctly identified every sample as
the highest-ranking match, even the seven which
gave SoundCompass difficulty. However, the dis-
parity in database size and composition prevents
more definite conclusions from this small test.
We have not compared the more recent version
of the system, which utilized more feature extrac-
tion, incorporated INOT values rather than raw
durations, and made other improvements such as
multithreaded parallel searches.
We also were able to compare the performance
of REPRED to the SuperMBox system from Jang
et al. (2001a). They implemented their own pitch
transcription component, which made use of some
additional heuristics in an attempt to smooth
out some of the pitch tracking errors we have
described; they also eliminated the requirement
of having a consonant stopping sound between
successive notes, allowing continuous humming
or even singing with words. (However, they
did not report on the relative accuracy of their
transcription process.) They assumed the tempo
of the user's input is consistent, and they take
advantage of this assumption by utilizing linear
scaling on the resulting query representation to
manipulate the effective tempo, creating several
time-stretched copies searched in parallel through
k-means clustering and a branch-and-bound tree
search on the resulting pitch vectors in order to
identify and rank the closest matches to a hummed
query. A follow-up system named MIRACLE
(Jang, Lee & Kao, 2001b) enabled searches to run
in parallel on several computers and a front-stage
fast algorithm to prune the size of the database, the
remainder of which is then searched by the more
complex algorithm. We tested our input samples
against the downloadable copy of SuperMBox
available at Jang's personal Web site (Jang et
al., n.d.). With databases of comparable size but
largely different tunes, we found SuperMBox
performed about as well as REPRED when it
was constrained to match only against the start
of songs; in its match-anywhere mode, it did not
perform as well. Again, the small size of the test
and the differences in database content preclude
a formal performance comparison.
summarY and conclusIon
MIR systems which allow for melody-based search
queries will be most useful to the average person
if hummed or sung input is a means of specifying
input queries. Given this allowance, there will
always be errors due to the music transcription
process itself, even with the anticipated continued
improvement of such automated systems. Among
the most significant sources of uncontrollable
input error are these:
The recording environment often cannot be
controlled. MIR systems deployed in public
spaces or which rely on wired or wireless
telephone transmission invariably will be
subject to ambient noise, generating false
notes.
Our own experience with test subjects
showed the difficulty of properly adjusting
the input level to minimize errors due to
the natural volume of the subject as well as
the subject's relative position with respect
Search WWH ::




Custom Search