Melodic Query Input for Music Information Retrieval Systems - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

as “same” when the two duration values were

within 25% of one another. Based on the results

of our first experiment, we additionally applied

our duration contour scheme to INOT values

rather than raw duration values as reported by

our transcriber.

We found that these contour representations

had even worse matching discrimination than the

solely pitch-based ones. INOT-based contours

using n-gram representation showed in the best

case that the intended target song could be placed

only in the top 33% of our 300-song database

on average with our MUSI group input queries.

Utilizing INOT values instead of note durations

improved the overall results by approximately

10% using our search methods.

We further developed many other variants

combining both pitch and INOT information in

various ways to improve search accuracy, but none

of these efforts proved fruitful in the end. Our

conclusion was that contour-based representations

are simply unlikely to work for MIR systems

in the presence of input query errors caused by

non-expert vocal input. We then turned to more

computationally complex strategies, finding suc-

cess with approximate matching algorithms based

on the work of Smith and Waterman (1981) and

also used by other MIR researchers.

used its output to build a musical database search

mechanism in 1995, using an even smaller test

database. Their system performed approximate

matching searches using ternary pitch contour

information. One of the ideas proposed by the

authors to improve searching was to increase the

resolution of the melodic contour data, so that a

note transition is labeled not merely as increasing

or decreasing, but some sense of magnitude is

given as well (slightly increasing vs. significantly

increasing). However, they did not perform any

experiments to test the feasibility of such a rep-

resentation or the limits of its resolution (i.e., how

many distinct subdivisions of “increasing” can

be used and where their boundaries lie). As we

have shown, this kind of representation, which

also forms the basis of the relatively new MPEG-7

Melody Contour description, will certainly not be

effective for input queries with errors due to the

combined effects of nonmusician humming skills

and automated transcription errors.

The MELDEX system mentioned earlier

became part of the New Zealand Digital Library

project (Bainbridge, Nevill-Manning, Witten,

Smith & McNab, 1999), which is currently avail-

able online (University of Waikato, n.d.). A

Web-based interface allows for hummed input

queries, and they use approximate matching

algorithms incorporating pitch contour, pitch

intervals, and/or rhythm information. Previous

versions of the system allowed the user control

over certain aspects of their search algorithm, but

at the time of this writing, the current system has

removed these options.

Hu and Dannenberg (2002) presented a series

of approximate matching algorithms that took into

account various input features of hummed input

queries. They explored a promising technique in

which no attempt is made to differentiate indi-

vidual notes in the hummed input; instead, pitch

values are reported in 100ms frames and then

scaled to many different tempos and keys to run

the matching algorithm many times. Preliminary

approximate matching mIr

Kageyama et al. (1993, 1994) was one of the first

groups to implement a voice transcription system

and an integrated search mechanism with a 500-

song test database in 1993. Hummed notes were

translated into a scale-step (semitone) representa-

tion which was then subjected to an approximate

pattern matching algorithm in order to locate a

match between the transcribed input phrase and a

song in the database. The accuracy of the system

was reported to be fair, normally requiring 12 or

more notes to find a match when testing against

a database of 500 songs. Ghias et al. (1995) also

created their own music transcription system and

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home