Information Technology Reference
In-Depth Information
algorithms could be made to account accurately
for intended rests in hummed queries.
One might expect that some of the difference
between durations and INOT values could be
attributed to the manner of vocalization required
by the note-tracking software. It is possible that
the consonant sounds added by the subjects in
singing da contributed to this difference in certain
cases, but further analysis of the data collected
makes this conclusion unlikely. Out of more than
4,000 notes recorded in this experiment, 42% have
identical values for duration and INOT, meaning
that Autoscore reported the time between the end
of such a note and the start of the following note
as zero. This measure was also fairly consistent
across subjects; 9 of the 15 had between 37-47%
of hummed notes exhibiting this equality.
results are compared with our own in Table 4 be-
low. For the pitch interval representations, Lindsay
also reported the percentage of trials for which
the correct target phrase was ranked either first
or second, since each possible five-note contour
sequence appeared twice in the trials. We report
this value under the column heading 1st/2nd . In
this and subsequent tables, we refer to the grouped
results of Lindsay's musician subjects as LIN-M
and the nonmusician as LIN-N.
Lindsay concluded from his subjects' excel-
lent performance in the study that pitch intervals
could be used as an accurate representation of
hummed input, at least for those with significant
musical training. Although his study was the most
comprehensive of its kind we have found, it had
a few limitations which caused us to question
whether his proposed representation would lead
to algorithms which would prove successful in
identifying hummed queries from the average
user:
experiment 3: humming novel
melodic phrases
Lindsay (1996) performed a small study with 6
subjects to see how well they could vocally re-
produce a series of 32 five-note sequences. The
sequences were carefully constructed so that,
among all of them, each pitch interval between
-7 and +7 semitones appeared approximately an
equal number of times. He tested the subjects'
accuracy by comparing their input phrases against
all 32 test phrases; an edit distance was computed
by summing the differences between the pitch
interval values and the corresponding values of
the target phrase, with the lowest-scoring phrase
selected as the best match. For the five subjects
with significant musical training, the correct
phrase was identified on average 86% of the time.
The sole nonmusician subject fared much worse,
matching the intended stimulus in only 41% of the
trials. Lindsay also tabulated how well the users'
data matched when encoded using ternary pitch
contours, finding that the musician group produced
the correct contour in 96% of the trials, while the
nonmusician was correct 72% of the time. These
Only six subjects participated in the experi-
ment; five of the six were musicians averag-
ing 12 years of experience, and two of them
had significant vocal training.
From our own informal experience in hear-
ing the singing and humming of random
people, it appeared that attempts to intone
larger pitch intervals were more likely to
induce reproduction errors, but in the study
the largest pitch interval tested was seven
semitones.
The representation scheme was tested by
attempting to match individuals' input
phrases against a music collection consist-
ing only of the 32 test phrases used in the
experiment.
With these ideas in mind, we expanded upon
Lindsay's methodology for our experiment by
testing a larger set of subjects and by including
more variation in the pitch intervals used in the
testing trials. We rewrote 50% of his test phrases
Search WWH ::




Custom Search