Information Technology Reference
In-Depth Information
on the different indexing schemes, obtaining
a number of ranked list of potentially relevant
documents, and combining the results in a single
ranked list using some strategies. The approach is
named data fusion or collection fusion , where the
latter term more precisely addresses the problem
of combining together the results from indexing
schemes built on different—and potentially non-
overlapping—collections of documents.
Collection fusion techniques are quite popular
in Web metasearch engines, which are services for
the automatic parallel querying of a number the
normal Web search engines where overall results
are presented in a single ranked list (Lee, 1997).
The advantages of a metasearch engines are an
higher coverage of the Web pages, which is the
union of the coverage of single search engines,
and improvements of the retrieval effectiveness
in terms of recall—because more documents are
retrieved—and in terms of precision—because
multiple evidences of the relevance of some
documents are available. The crucial point in
the development of a collection (or data) fusion
technique is on the way different ranked lists are
fused together. A number of constraints have to
be considered for typical collection fusion ap-
plications, namely the indexing schemes of the
different search engines are not known; there is a
different coverage of the overall set of documents;
the individual RSVs, or the similarity score, may
not be known by the metasearch engine; if known,
the RSVs may be expressed in different scales and
have different statistical distributions. For this
reason, some techniques have been proposed using
the only information that is surely available: the
rank of each retrieved document for each search
engine (Fox & Shaw, 1994).
Most of these constraints do not hold when the
parallel indexes are built within the same retrieval
system, because there is complete control on each
weighting scheme, on the range and distribution
of each RSVs which can be obtained using the
same retrieval engine that is run on the different
indexes. Even if this aspect is more related to the
retrieval rather than indexing of music documents,
it is worth mentioning an experiment on data fu-
sion of alternative indexing schemes.
fusion of different melodic
descriptors
Even when a single dimension is used to extract
content descriptors, there are a number of choices
that have to be made on the way lexical units are
computed that affect the effectiveness of an index-
ing scheme. Let us consider the common situa-
tion in which the melodic information is used as
content descriptor, using an example of a complete
evaluation of music indexing schemes.
The first choice in music indexing is how lexical
units are computed, as described in the previous
section. In the running example, the DD ap-
proach is used—Data Driven, where lexical units
are computed using a pattern analysis approach
presented in the preceding section—because it
gives high performances in terms of retrieval
while allowing for different lengths of the index
terms. The second step consists of choosing
whether using absolute or relative features. The
third step regards the levels of quantization that
has to be applied to each feature, that may range
from one single level—meaning that the feature
is not used in practice—to as many levels as
the possible values—meaning that no quantiza-
tion is applied. Table 5 represents the different
combinations of time and pitch information of
melodic lexical units; the three cells marked with
an acronym in bold are the ones that have been
used in the experiment on data fusion, the two
cells marked with “---” highlight combinations
that do not make sense.
As shown in Table 5, three indexing schemes
have been used: PIT that uses only relative pitch
information, with N =9 levels of quantization of
melodic intervals; IOI that uses only absolute
duration information, with N =11 levels the quan-
tization of exact durations; BTH that uses both
relative pitch and absolute duration. Having used
Search WWH ::




Custom Search