Melodic Query Input for Music Information Retrieval Systems - Intelligent Music Information Systems: Tools and Methodologies

Information Technology Reference

In-Depth Information

to the microphone (which may also change

over the course of a single input sample).

database which needs to be examined with the

more complex search and matching algorithms.

Our own work with REPRED shows the value of

the independently-developed MPEG-7 Melody

Sequence DS representation as a suitable means

of encoding and representing hummed input

queries.

There have been some efforts to assemble test

suites of data as a common resource for MIR re-

searchers, such as the vocal input query collection

of Unal et al. (2003), the compendium of music

collections maintained by Byrd (2006), and the

audio description contest associated with the 2004

ISMIR symposium (Cano, Gomez, Gouyon, Her-

rera, Koppenberger, 2006; ISMIR, 2004). MIR

systems and their relative performance advantages

can be evaluated more objectively when they are

compared using common criteria, databases or

test input. The MIR community as a whole will

benefit from continuing this trend to cooperate

in the testing and evaluation of disparate systems

and algorithms.

•

The duration of a typical extraneous note in

a transcription is long enough that it cannot

be distinguished from genuine short notes

as vocalized by the subject.

In addition to these likely sources of errors,

additional errors from the human input source

must also be anticipated.

Pitch and duration contour representations are

not suitable for vocal input due to the relatively

high number of input errors in a typical query.

While this type of encoding affords fast and

efficient search algorithms, query strings must

have a lower error rate for this type of search to

be effective.

Even in the absence of errors introduced by

the transcription process, the reliability of pitch

contour is dependent upon the user's familiarity

with the phrase to be vocalized. For unfamiliar

melodies, even a simple ternary pitch contour will

likely contain several errors, regardless of the

musical skills of the user. Contour representations

with more than three gradations are unlikely to

correctly capture the intended melody in the vocal

rendition of a nonexpert user.

Rather than using actual note durations, sys-

tems can more accurately represent the rhythm

of the user's input by encoding the times of the

onsets of individual notes instead. Addition-

ally, long notes are shortened, and the degree of

compression depends on the musical ability of the

user; representing onset times on a logarithmic

scale rather than a linear scale reduces the nega-

tive effects of this phenomenon.

As we have seen, the most recent MIR systems

which include query-by-humming components

still utilize more complex approximate matching

algorithms in order to perform their searches.

Continued improvements in accuracy and in

computational efficiency can be expected by find-

ing new data representations, parallel searching

and pruning techniques to reduce the area of the

acknoWledgment

The author's work reported here was supported in

part by research grants awarded by the National

Science Foundation under contracts EIA-9214887,

EIA-9214892, IIS-9213823, CCR-9527151 and

EIA-9634485. The author gratefully acknowl-

edges Naoko Kosugi of NTT Laboratories for

providing a set of sample input queries from his

group's research studies.

references

Attneave, F. & Olson, R. K. (1971). Pitch as a me-

dium: A new approach to psychophysical scaling.

American Journal of Psychology , 84 , 147-166.

Bainbridge, D., Nevill-Manning, C. G., Witten, I.

H., Smith, L. A. & McNab, R. J. (1999). Towards

Intelligent Music Information Systems: Tools and Methodologies

Search WWH ::

Custom Search

Home