Information Technology Reference
In-Depth Information
enterprise search engines can be designed to provide access to all resources via one
shared interface. The main requirement for setting up such system is to crawl and
index each of the repositories in separated indices.
Another important aspect is data protection. The presence of multiple document
repositories, or collections, in an enterprise that are not necessarily regulated by a
single point of administration pose a unique structure of the data collections. Com-
panies are well advised to restrict access to certain information such as blueprints
of their prototypes, customer data, email correspondence, or other sensitive data,
hence protecting their assets from potential data theft [ 12 , 14 ]. While some data are
available for the whole enterprise (e.g., a company directory service or the com-
pany's web pages), various restrictions are applied. For example, data access could
be restricted based on hierarchical boundaries, such as a department boundary. This
means that data belonging to one department may not be shareable with employees
of other departments. Hence, data protection is an important aspect in enterprise
environments.
File distribution, heterogeneity, and access restriction play a key role in the appli-
cation of enterprise search systems that aim to assist employees in their daily infor-
mation gathering tasks. In this chapter, we introduce an enterprise search systemwith
distributed indices that addresses the data accumulation task of enterprise search sys-
tems. The framework incorporates the idea of data mining agents, a technique, which
has been successfully employed to create data warehouses [ 19 ]. We use autonomous
agents for every task in the data accumulation and indexing activity, i.e., each agent
provides core services that cover a specific part in the back-end. Complex tasks such
as crawling and indexing a file server is achieved by combining the corresponding
agents, i.e., the autonomous agents form a community to provide a joint service
in creating search engine capabilities. When multiple data repositories (collections)
need to be indexed we use these agent communities to build a distributed search
engine. Search requests are handled by broker agents that verify users' identity and
their access rights using the enterprise's directory access constraints that are defined
using Lightweight Directory Access Protocol (LDAP).
The chapter is structured as follows. Section 4.2 introduces related work in the
fields of desktop and enterprise search. Section 4.3 introduces technical challenges
that need to be considered when building a distributed search engine in an enter-
prise environment. A comparison of different search result aggregation approaches
is presented in Sect. 4.4 . An exemplary implementation of such system is described
in Sect. 4.5 . Section 4.6 concludes this topic chapter.
4.2 Related Work
This work builds upon prior work from different research domains, including
enterprise search, distributed information retrieval, and multi-agent systems. In the
remainder of this section, we present these domains and highlight state-of-the-art
research approaches.
Search WWH ::




Custom Search