Information Aggregation in an Enterprise - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

and (2) by using distributed information retrieval on multiple disjoint indices. The

choice of which form is ideal depends on many factors such as geophysical limi-

tation and/or network capacity. Distributed information retrieval is a research area

with open research questions in well-defined processing steps. These steps are collec-

tion representation, collection selection, and result merging. Distributed information

retrieval addresses the challenge of managing multiple indices by using a broker.

Another research topic that we discussed is multi-agent systems. In a multi-agent

system, multiple agents interact with each other in a distributed manner. This feature

provides us an approach for the implementation of distributed information retrieval

engine. In our case, we encapsulate functionalities such as crawling, retrieving, bro-

kering, user profile management in specialized agents. Search agents are contacted

by a broker agent, thus facilitating the result merging between different collections.

We introduce our distributed enterprise search system that is deployed in a pilot

project with the administration offices of the city of Berlin. In this pilot project we

emphasized how different network areas, each withmultiple repositories, are handled

with our implementation using JIAC V as multi-agent framework. The separation

between each of the city districts as an independent department requires deployment

of multiple broker agents. Each network area has its own characteristics regarding

user authentication. In our implementation we decided that the enforcement of doc-

ument level security should be managed by the search engine. This is solved by

adding available access lists to every file we crawled and by saving this information

during the indexing process. When processing a search request, every contacted bro-

ker agent verifies the user's credential and forwards this information to the search

agents. The search agents process the search query along with the user's credential to

filter the relevant documents. This process allows the search engine to return search

results containing only documents accessible to the user.

For future works we want to improve the merging of search results. In order to

have a better merged result we need to learn to prioritize relevant data collections.

Collection selection is a step in distributed information retrieval that has not yet

been properly explored in our enterprise setting. In normal environments, collection

selection can rely only on relevance for the search queries. However, in enterprise

environments, we also have to consider the security aspect before retrieving results.

For example, even though a repository is relevant, it is possible that most of the

documents in this repository are not accessible for the current user. This means that it

could limit the results a user can retrieve andmay reduce the recall valuewhen another

relevant repository is not selected. Addressing this issue, we currently investigate how

to gather the right evidences [ 3 , 11 , 20 ] in selecting the right collections considering

the security aspect of enterprise environment. In addition, we aim to incorporate

multimedia content into our system, which requires further processing [ 16 ]. Finally,

we intend to improve user interaction, e.g., by introducing gamification elements

into the system that incentivize users to interact with the system. Preliminary studies

[ 23 , 24 ] in this direction are promising.

Acknowledgments We would like to thank ITDZ Berlin for their support and cooperation in

realizing the pilot project.

Smart Information Systems: Computational Intelligence for Real-Life Applications

Search WWH ::

Custom Search

Home