Information Technology Reference
In-Depth Information
Figure 3. Parallel processing of documents and
queries aimed at retrieval of potentially relevant
documents
Retrieval can be then carried out by measuring
the similarity (or the distance) between the two
strings, ranking the retrieved documents accord-
ing to their decreasing similarity (or decreasing
distance) from the query. The complexity of these
techniques is linear with the size of the document
collection, because all the documents have to be
matched against the query.
Indexing techniques does not require this
exhaustive comparison, and in fact the main
motivation behind indexing is its efficiency and
scalability also for very large collections of docu-
ments. Let us consider each index term as a pointer
to the list of the documents that contain it. It is
assumed that the number of documents in each list
is small if compared to the number of documents
in the collection, apart from stop-words that are
usually not used as index terms. This assumption
is surely true for textual documents, but it applies
also to music documents because melodies have
different thematic material. Index terms can
then be stored in efficient data structures, such
as hash tables that can be accessed in constant
time or binary search trees that can be accessed
in logarithmic time.
The efficiency implied by indexing is somehow
balanced by the retrieval effectiveness. The main
issue is that, in order to be efficient, the access
to the data structure requires an exact match be-
tween documents and query indexes. While this
assumption is reasonable for textual documents,
because the user is expected to spell correctly the
words of the query, in the music domains there
are many sources of mismatch that may affect
retrieval effectiveness. A melodic query can ei-
ther contain errors, due to imprecise recall of the
melody, or be a different variant of a particular
theme. These differences may affect the way in-
dex terms are computed from the query and the
way they are represented. For this reason, some
peculiar aspects of music document indexing are
addressed in more detail.
involve the application of noise reduction and pitch
tracking techniques (de Cheveigné & Baskind,
2003), in the case of audio queries, followed by
an approach to segmentation that can be carried
out with the same technique used for document
segmentation or with different techniques tailored
to the peculiarities of the queries. Figure 3 rep-
resents the parallel processing of documents and
queries aimed at computing the RSV for ranking
relevant documents.
Efficiency and Effectiveness
It can be argued that all these steps, although useful
for indexing textual documents, are not neces-
sary for a music retrieval task that can be solved
directly within an approximate string matching
framework, as mentioned in the introduction of
this chapter. For instance, the main melodies of
music documents can be represented by arrays of
symbols, where the number of different symbols
depends on the kind of quantization applied to
melodic and rhythmic information. The user's
query is normally an excerpt of a complete melody
and thus can undergo the same representation.
Search WWH ::




Custom Search