Information Technology Reference
In-Depth Information
not work successfully with input from this class
of potential user.
of length three through six. With our input query
set, we found matching accuracy to be completely
unacceptable, even with a toy database of only
300 songs, for all but a few of the highest-quality
input queries in our collection. The best of these
poor results ranked the intended song only in the
top 25% of the database on average for our MUSI
subjects, and even lower for the other groups.
As Downie and Nelson also reported, n-gram
representation can be extremely sensitive to the
presence of errors, and as we described earlier,
our input collection contained an average of one
false or incorrectly-recorded note for every five
accurately transcribed notes. Shorter n-grams are
less susceptible to errors, but they offer little dis-
crimination power; for example, n-grams of size
three represent clusters of just four consecutive
notes, and there are only 27 (3 x 3 x 3) possible
values for an n-gram of this length. Larger values
mean that a single error propagates to progres-
sively more consecutive individual n-grams in
the encoded query string.
Since our study showed that one-semitone
intervals were usually exaggerated by our sub-
jects, we performed tests with our database songs
encoded in two different ways: in one represen-
tation, only consecutive notes of the same MIDI
pitch value were marked as “same” or “=” in
pitch contour. In the other, we encoded intervals
of one semitone as “same” as well, since we ex-
pected that most instances represented a subject
who intended to hum the same pitch twice. This
change did not show a significant improvement
in the overall results for any of the algorithms we
devised and tested.
In searching for ways to improve our results
with this type of data representation and search,
we chose to incorporate note duration informa-
tion. We went on to create our own corresponding
definition of duration contour , in which we marked
consecutive pairs of notes as having increased in
duration, decreased in duration or roughly equal
duration. Analysis of the data collected in our
experiments led us to define a contour transition
mIr sYstem comparIsons and
testIng
Our own efforts were directed toward develop-
ing effective schemes for representing hummed
input queries to maximize search discrimination
while minimizing the negative impact of query
errors of all the types we have described. Hand
in hand with an effective representation format
goes the need for search algorithms able to match
these search queries to a music database. We
assembled our own test database of over 3,600
songs, consisting mostly of the Greenhaus Digital
Tradition (n.d.) folksong database, which was
also used in the MELDEX (McNab et al., 1996;
1997) system. As described above, the first two
experiments of our study provided us with a set
of 172 input test phrases from fifteen subjects of
varying musical ability.
contour-Based mIr
We created and tested many algorithms which
made use of pitch contour information as a
representation method for both input queries
and database entries. Our goal was to produce
a computationally efficient search mechanism
that was still sufficiently robust to accommodate
the types and quantity of errors we found in our
experiments. The details of all of these efforts
are beyond the scope of this chapter, but can be
found in Kline (2002). We present here the most
significant, general results of our work.
Among our accomplishments, we indepen-
dently developed and tested the representation
scheme known as n-grams , first developed by Uk-
konen (1992), which were used at about the same
time in MIR systems developed by Uitdenbogerd
and Zobel (1998, 1999) and by Downie and Nelson
(2000). We performed many tests using n-grams
Search WWH ::




Custom Search