Information Technology Reference
In-Depth Information
The second processing step, shown in Formula 4.4 is then used to normalize the
score for each of the document by multiplying the document's original score with its
collection weighted value. By using this strategy the more important a collection for
the given search query the more boost the returned documents from this collection
will get.
D =
W
(
d
|
q
) ยท
D
(4.4)
4.4.2 Merging Effects
In order to illustrate the effect that different merging methods can have on the perfor-
mance of the search results merging, we performed an experiment using the FedWeb
2012 dataset [ 26 ]. This dataset consists of documents from 108 different sources
that are divided into 12 categories. Table 4.1 lists the complete subjects and exam-
ples of the used search engines. This dataset includes 50 TREC queries with human
relevance judgments for documents retrieved from each search engine.
In order to evaluate the result merging performance, we executed experiments
using 50 queries delivered with this dataset and measured these metrics: precision,
recall, and Normalized Discounted Cumulative Gain [ 17 ]. We further differentiate
these metrics in different length cut @5 and @10. To gain insight about the per-
formance changes we executed an experiment round multiple times with different
selected collection numbers as environment setting. In our result presentation we
present the three metrics mentioned above using collection numbers from three, six,
and twelve categories, respectively.
Figures 4.1 and 4.2 show the results for precision@5 and precision@10, respec-
tively. Recall measurements, r@5 and r@10, are shown in Figs. 4.3 and 4.4 .
Table 4.1 Overview of the categorization in FedWeb 2012 dataset
Category
Count
Examples
General Web Search
10
Google, Yahoo, AOL, Bing, Baidu
Multimedia
21
Hulu, Youtube, Photobucket
Q&A
2
Yahoo Answers, Answers.com
Jobs
7
LinkedIn Jobs, Simply Hired
Academic
16
Nature, CiteSeerX, SpringerLink
News
8
Google News, ESPN
Shopping
6
Amazon, eBay, Discovery Channel Store
Encyclopedia/Dict
6
Wikipedia, Encyclopedia Britannica
Books and libraries
3
Google Books, Columbus Library
Social and social sharing
7
Facebook, MySpace, Tumblr, Twitter
Blogs
5
Google Blogs, WordPress
Other
17
OER Commons, MSDN, Starbucks
 
Search WWH ::




Custom Search