Information Technology Reference
In-Depth Information
and (2) by using distributed information retrieval on multiple disjoint indices. The
choice of which form is ideal depends on many factors such as geophysical limi-
tation and/or network capacity. Distributed information retrieval is a research area
with open research questions in well-defined processing steps. These steps are collec-
tion representation, collection selection, and result merging. Distributed information
retrieval addresses the challenge of managing multiple indices by using a broker.
Another research topic that we discussed is multi-agent systems. In a multi-agent
system, multiple agents interact with each other in a distributed manner. This feature
provides us an approach for the implementation of distributed information retrieval
engine. In our case, we encapsulate functionalities such as crawling, retrieving, bro-
kering, user profile management in specialized agents. Search agents are contacted
by a broker agent, thus facilitating the result merging between different collections.
We introduce our distributed enterprise search system that is deployed in a pilot
project with the administration offices of the city of Berlin. In this pilot project we
emphasized how different network areas, each withmultiple repositories, are handled
with our implementation using JIAC V as multi-agent framework. The separation
between each of the city districts as an independent department requires deployment
of multiple broker agents. Each network area has its own characteristics regarding
user authentication. In our implementation we decided that the enforcement of doc-
ument level security should be managed by the search engine. This is solved by
adding available access lists to every file we crawled and by saving this information
during the indexing process. When processing a search request, every contacted bro-
ker agent verifies the user's credential and forwards this information to the search
agents. The search agents process the search query along with the user's credential to
filter the relevant documents. This process allows the search engine to return search
results containing only documents accessible to the user.
For future works we want to improve the merging of search results. In order to
have a better merged result we need to learn to prioritize relevant data collections.
Collection selection is a step in distributed information retrieval that has not yet
been properly explored in our enterprise setting. In normal environments, collection
selection can rely only on relevance for the search queries. However, in enterprise
environments, we also have to consider the security aspect before retrieving results.
For example, even though a repository is relevant, it is possible that most of the
documents in this repository are not accessible for the current user. This means that it
could limit the results a user can retrieve andmay reduce the recall valuewhen another
relevant repository is not selected. Addressing this issue, we currently investigate how
to gather the right evidences [ 3 , 11 , 20 ] in selecting the right collections considering
the security aspect of enterprise environment. In addition, we aim to incorporate
multimedia content into our system, which requires further processing [ 16 ]. Finally,
we intend to improve user interaction, e.g., by introducing gamification elements
into the system that incentivize users to interact with the system. Preliminary studies
[ 23 , 24 ] in this direction are promising.
Acknowledgments We would like to thank ITDZ Berlin for their support and cooperation in
realizing the pilot project.
Search WWH ::




Custom Search