Information Technology Reference
In-Depth Information
As explained in Sects. 4.2.1 and 4.2.2 , we create multiple indices for each of
the repositories we want to search. An interesting challenge is how to access these
indices. [ 5 , 37 ] propose to use software brokers as a centralized component that can
communicate with these indexes. In order to implement this communication model
we realize our distributed search system as a multi-agent system. We model the
multiple indices to be handled by search agents. These search agents are brokered
by a broker agent as a centralized component. By using a broker agent we are able to
process a search query without direct communication with every search agent. The
broker agent is also responsible for verifying users' credentials. After successful user
verification the broker agent collects user reading rights and group membership from
the LDAP server through a LDAP agent and forwards this information to the search
agents. The search agents then match the users' rights with the access list acquired
from the crawling process. This means that users can only receive documents as
search results for which they have access rights.
4.3.3 Retrieval
In contrast to an information retrieval system with one single index, distributed
information retrieval systems rely on multiple indices that are usually created inde-
pendently from each other [ 10 , 30 ]. When users trigger a retrieval by formulating a
search query, related documents are retrieved within each index and then returned as
a result list. Although not every index will necessarily contain relevant documents, it
is more than likely that documents will be found in more than one index, i.e., multi-
ple ranked lists are created. An interesting research challenge is to merge these lists,
hence presenting all search results in one larger result list. We compare in Sect. 4.4
the performance of different state-of-the-art unsupervised result merging algorithms
using the FedWeb 2012 dataset [ 26 ].
When the broker agent receives answers from all search agents the broker agent
normalizes these results and re-ranks them as a single search result list. As long as
all of these repositories are reachable by one broker, only this broker is required to
access all indices. However, there are cases when repositories cannot be reached by
a broker, e.g., due to physical network boundaries. A typical example is the local
desktop of a user. Local desktop computers can usually access file servers, but not
vice versa. In this case, multiple brokers need to be considered. Each of these brokers
is responsible for a specific group of indices in the network.
4.4 Evaluation of Result Merging Algorithms
One of the tasks in distributed information retrieval (see Sect. 4.2.2 ) is result merging.
Its purpose is to merge multiple result lists to a single re-ranked list. For this task,
the broker needs to normalize every document's score from each result list from
Search WWH ::




Custom Search