Information Aggregation in an Enterprise - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

enterprise search engines can be designed to provide access to all resources via one

shared interface. The main requirement for setting up such system is to crawl and

index each of the repositories in separated indices.

Another important aspect is data protection. The presence of multiple document

repositories, or collections, in an enterprise that are not necessarily regulated by a

single point of administration pose a unique structure of the data collections. Com-

panies are well advised to restrict access to certain information such as blueprints

of their prototypes, customer data, email correspondence, or other sensitive data,

hence protecting their assets from potential data theft [ 12 , 14 ]. While some data are

available for the whole enterprise (e.g., a company directory service or the com-

pany's web pages), various restrictions are applied. For example, data access could

be restricted based on hierarchical boundaries, such as a department boundary. This

means that data belonging to one department may not be shareable with employees

of other departments. Hence, data protection is an important aspect in enterprise

environments.

File distribution, heterogeneity, and access restriction play a key role in the appli-

cation of enterprise search systems that aim to assist employees in their daily infor-

mation gathering tasks. In this chapter, we introduce an enterprise search systemwith

distributed indices that addresses the data accumulation task of enterprise search sys-

tems. The framework incorporates the idea of data mining agents, a technique, which

has been successfully employed to create data warehouses [ 19 ]. We use autonomous

agents for every task in the data accumulation and indexing activity, i.e., each agent

provides core services that cover a specific part in the back-end. Complex tasks such

as crawling and indexing a file server is achieved by combining the corresponding

agents, i.e., the autonomous agents form a community to provide a joint service

in creating search engine capabilities. When multiple data repositories (collections)

need to be indexed we use these agent communities to build a distributed search

engine. Search requests are handled by broker agents that verify users' identity and

their access rights using the enterprise's directory access constraints that are defined

using Lightweight Directory Access Protocol (LDAP).

The chapter is structured as follows. Section 4.2 introduces related work in the

fields of desktop and enterprise search. Section 4.3 introduces technical challenges

that need to be considered when building a distributed search engine in an enter-

prise environment. A comparison of different search result aggregation approaches

is presented in Sect. 4.4 . An exemplary implementation of such system is described

in Sect. 4.5 . Section 4.6 concludes this topic chapter.

4.2 Related Work

This work builds upon prior work from different research domains, including

enterprise search, distributed information retrieval, and multi-agent systems. In the

remainder of this section, we present these domains and highlight state-of-the-art

research approaches.

Search WWH ::

Custom Search

Home