Information Aggregation in an Enterprise - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

As explained in Sects. 4.2.1 and 4.2.2 , we create multiple indices for each of

the repositories we want to search. An interesting challenge is how to access these

indices. [ 5 , 37 ] propose to use software brokers as a centralized component that can

communicate with these indexes. In order to implement this communication model

we realize our distributed search system as a multi-agent system. We model the

multiple indices to be handled by search agents. These search agents are brokered

by a broker agent as a centralized component. By using a broker agent we are able to

process a search query without direct communication with every search agent. The

broker agent is also responsible for verifying users' credentials. After successful user

verification the broker agent collects user reading rights and group membership from

the LDAP server through a LDAP agent and forwards this information to the search

agents. The search agents then match the users' rights with the access list acquired

from the crawling process. This means that users can only receive documents as

search results for which they have access rights.

4.3.3 Retrieval

In contrast to an information retrieval system with one single index, distributed

information retrieval systems rely on multiple indices that are usually created inde-

pendently from each other [ 10 , 30 ]. When users trigger a retrieval by formulating a

search query, related documents are retrieved within each index and then returned as

a result list. Although not every index will necessarily contain relevant documents, it

is more than likely that documents will be found in more than one index, i.e., multi-

ple ranked lists are created. An interesting research challenge is to merge these lists,

hence presenting all search results in one larger result list. We compare in Sect. 4.4

the performance of different state-of-the-art unsupervised result merging algorithms

using the FedWeb 2012 dataset [ 26 ].

When the broker agent receives answers from all search agents the broker agent

normalizes these results and re-ranks them as a single search result list. As long as

all of these repositories are reachable by one broker, only this broker is required to

access all indices. However, there are cases when repositories cannot be reached by

a broker, e.g., due to physical network boundaries. A typical example is the local

desktop of a user. Local desktop computers can usually access file servers, but not

vice versa. In this case, multiple brokers need to be considered. Each of these brokers

is responsible for a specific group of indices in the network.

4.4 Evaluation of Result Merging Algorithms

One of the tasks in distributed information retrieval (see Sect. 4.2.2 ) is result merging.

Its purpose is to merge multiple result lists to a single re-ranked list. For this task,

the broker needs to normalize every document's score from each result list from

Search WWH ::

Custom Search

Home