Information Technology Reference
In-Depth Information
difficult. Therefore, it is normally assumed that
music is monophonic (produced using a single
instrument) and stored as scores in the database.
The pitch of each note is known. The common
query input form is humming. To improve pitch
tracking performance on the query input, a pause is
normally required between consecutive notes.
There are two pitch representations. In the
first method, each pitch except the first one is
represented as pitch direction (or change) relative
to the previous note. The pitch direction is either
U(up), D(down) or S(similar). Thus, each musical
piece is represented as a string of three symbols
or characters.
The second pitch representation method rep-
resents each note as a value based on a chosen
reference note. The value is assigned from a set
of standard pitch values that is closest to the es-
timated pitch. If we represent each allowed value
as a character, each musical piece or segment is
represented as a string of characters. But in this
case, the number of allowed symbols is much
greater than the three that are used in the first
pitch representation.
After each musical piece is represented as a
string of characters, the final stage is to find a match
or similarity between the strings. Considering
that humming is not exact and the user may be
interested in find similar musical pieces instead
of just the same one, approximate matching is
used instead of exact matching. The approximate
matching problem is that of string matching with
k mismatches. The variable k can be determined
by the user of the system. The problem consists
of finding all instances of a query string Q =
q1q2q3 . . . qm in a reference string R = r1r2r3
. . . rn such that there are at most k mismatches
(characters that are not the same). There are sev-
eral algorithms that have developed to address the
problem of approximate string matching (Wold
et al., 1996).
Both the systems of Muscle Fish LLC (Wold
et al., 1996) and the University Waikato produced
good retrieval performance. But the perform-
ance depends on the accuracy of pitch tracking
of hummed input signals. High performance is
only achieved when a pause is inserted between
consecutive notes.
Since humming is the most natural way to
formulate music queries for people who are not
trained or educated with music theory. Therefore
many researchers have proposed techniques for
query-by-humming.
Many techniques based on melody matching
are proposed, and some of them can support query-
by-humming (Logan, Ghias, & Chamberlin, 1995;
Zhu & Kankanhalli, 2002).
For a query-by-humming system to work well,
reliable pitch detection in the humming is critical.
There are 2 types of query-by-humming methods:
(1) the method based on music notes segmenta-
tion and matching; (2) and the method based on
continuous pitch contour matching.
The first type of methods (Pollastri, 2002)
requires the user to separate each music note
with a short silence or hum with a particular
syllable (such as “Da”). By such restrictions, it
is assumed that each individual note can he ac-
curately segmented by using signal energy. The
pitch for the segmented note is then estimated.
The restrictions, however, make the methods less
practical for a real-world music retrieval system,
since a user cannot always be aware of the note
boundaries particularly when there are tied notes
in the melody. Figure 20 illustrates the flow of
such pitch detection method.
The second type of query-by-humming
methods does not impose the above mentioned
restrictions. The pitch value for each audio (a short
time window) is estimated, and then melody is
represented using a pitch contour, or a time series
of pitch values with no music note identities.
Music retrieval is done by similarity match-
ing of the pitch contours (Jang, 2001; Zhu, 2002).
These approaches have shown a better perfor-
mance than the first type of method. Although
note segmentation error does not exist in this
Search WWH ::




Custom Search