Information Technology Reference
In-Depth Information
Visual feature selection: select informative visual features using some feature se-
lection methods to reduce the dimensionality of the feature space.
Employment of learning-to-rank algorithms: randomly partition the dataset into n
folds. Hold one fold as the test set and train the ranking function using a learning-
to-rank method on the remaining data. Predict the relevance scores of the test
set. Repeat until each fold is held out for testing once. The predicted scores of
different folds are combined to generate a new visual ranking score.
Rank aggregation: After normalization, the ranking score produced by the orig-
inal text retrieval function and the visual ranking score are linearly combined to
produce a merged score.
Re-Rank: Sort the combined scores to output a new ranked list for the multimedia
data.
The above techniques have been tested in different scenarios of multimedia re-
trieval, such as image tag recommendation and multiple canonical image selection.
The corresponding experiments indicate that the learning-to-rank methods can sig-
nificantly boost the ranking performance over the initial text-based retrieval results,
and can also outperform many heuristic based re-ranking methods proposed in the
literature.
14.4 Text Summarization
It has become standard for search engines to augment result lists with document
summaries. Each document summary may consist of a title, abstract, and a URL. It
is important to produce high quality summaries, since the summaries can bias the
perceived relevance of a document.
Document summarization can either be query independent or query dependent.
A query independent summary conveys general information about the document,
and typically includes a title, static abstract, and URL, if applicable. The main prob-
lem with query independent summarization is that the summary for a document
never changes across queries. In contrast, query-dependent summarization biases
the summary towards the query. These summaries typically consist of a title, dy-
namic abstract, and URL. Since these summaries are dynamically generated, they
are typically constructed at query time.
In [ 10 ], the authors study the use of learning-to-rank technologies for generating
a query-dependent document summary. In particular, the focus is placed on the task
of selecting relevant sentences for inclusion in the summary.
For this purpose, one first needs to extract features as the representation of each
sentence. In [ 10 ], the following features are extracted.
Query-Dependent Features, including exact match, the fraction of query terms
that occur in the sentence, the fraction of synonyms of query terms that occur in
the sentence, and the output of the language model.
Query Independent Features, including the total number of terms in the sentence,
and the relative location of the sentence within the document.
Search WWH ::




Custom Search