Information Technology Reference
In-Depth Information
testing was performed on a 598-song database
with 37 query samples.
In our own work, we created and tested many
existing and new music representations which
we incorporated into various algorithms utiliz-
ing approximate matching techniques along the
lines of the systems described here and below.
As with our fast searching efforts described in
the previous section, we made use of pitch data,
duration data, and INOT data in our work. Here
we describe only our most successful technique,
which is named REPRED for relative-pitch,
relative-duration. Further details of the most
successful algorithms we developed can be found
in Kline and Glinert (2003).
The key insight to REPRED came as a result of
our development of a tempo estimator for our input
query samples. Some of our algorithms made use
of this estimator in order to transform note dura-
tions in terms of beats. The resulting REPSCAD
(relative-pitch, scaled-duration) algorithm proved
to be fairly effective, but analysis of our input
data set through our tempo estimator revealed a
pattern in INOT values. While we already knew
anecdotally that subjects typically compress long
notes when humming or singing, we found that
the drop-off could be predicted fairly well with
a logarithmic transformation.
The resulting REPRED algorithm represents
note durations in this way: for a given note x, its
duration component is given by taking the ratio
of the INOT value for note x over the INOT value
of the note following it, then taking the logarithm
of the result, that is, log 2 ( INOT x / INOT x+1 ). The
interval distance in semitones represents pitch
values. REPRED then uses these two values
within the approximate matching algorithm by
means of a scaled linear combination.
We found that the REPRED algorithm suc-
cessfully identified the target song within the
top ten results of a 3,600-song database search in
67% of our trials. When removing from consid-
eration the 3 subjects whose performances were
consistently below par (one of which was from
each of our three skill groups), success climbed
to 78% of all trials.
A few other recent publications have suggested
similar ideas to this. The matching algorithm
described in the CubyHum MIR system by Pauws
(2002) also incorporates a duration ratio as part
of his work but does not involve log scaling and
is used in a different manner. A more recent
system by Unal, Narayanan and Chew (2004)
similarly incorporates a duration ratio into their
input representation. The MPEG-7 Melody
Sequence description scheme, which was being
developed independently from us as our work was
being completed, uses exactly the same method
as our REPRED implementation to encode and
represent note duration information (Gomez et
al., 2003, p. 3).
other systems and techniques
As part of our testing of REPRED, we submitted
to the MELDEX system a small subset of our
input test queries as digitized audio (.wav) files,
each containing at least twelve hummed notes,
which were known to be in their database. For
more than half of our tests, the system reported
no matches whatsoever; others showed a list of
results which did not contain the correct song;
and in just one case did it correctly identify the
song as the first title in the returned list of close
matches. We cannot draw any definite conclusions
from this informal test; it appears the most com-
mon problem was their transcriber missed some
of the hummed notes in our queries, but enough
remained that it seems the queries should have
returned some results, even if incorrect.
Kosugi et al. (2000) and Kosugi, Sakurai and
Morimoto (2004) produced an MIR system named
SoundCompass. Their original version utilized a
database of over 10,000 MIDI songs, while their
latest version has over 20,000. Their initial system
made use of Wildcat Canyon's Autoscore software
to handle the pitch transcription task, though they
have since made further improvements including
Search WWH ::




Custom Search