Information Technology Reference
In-Depth Information
4.3 Technical Challenges
As mentioned in Sect. 4.2.1 , there are two approaches in realizing search systems for
enterprise users. One possibility is to create a single index that contains documents
from various repositories. However, restrictions such as physical locations, different
administration policies, and bandwidth limitation make the data crawling process
difficult to perform efficiently [ 15 ]. Therefore, the creation of various distributed
indices is more feasible as it eliminates the need to transfer large amount of data
for creating a centralized index. In this section, we outline various conditions and
requirements for creation of a distributed search engine in an enterprise environment.
The main focus of the section is on presenting technical issues that occur when such
search system is set up. Section 4.3.1 first describes the types of data collections that
often occur in enterprise environments. Section 4.3.2 then outlines the required steps
for building multiple indices. The querying process of a distributed search engine is
illustrated in Sect. 4.3.3 .
4.3.1 Typical Data Repositories
Enterprise is an organizational entity with a defined structure and boundaries and
involving many parties with common interest. Through the defined structure and
boundaries, information available within an enterprise environment can typically be
categorized based on their content and their respective access rights.
The first type of information is publicly available and hence can be accessed by
both employees as well as other parties who show interest in the company. A typical
example is the company's webpage that can be accessed from anywhere in the world.
These types of repositories can be freely searched regardless of user's permission.
The second type of repository contains information that can only be accessed
internally within the company's physical network. We can further divide this type
into two categories: (1) repositories that do not need authentication and (2) repos-
itories that require authentication. Intranet webpages, wiki pages, and similar data
repositories that can be found in the company's intranet fall under the first category.
As long as users are using the company's ip-ranges they can freely open and access
the information. The second category represents repositories which contain protected
data, i.e., some sort of authentication is required before they can be accessed. This
means accessing through company's physical address alone is not enough, users
should validate their credential by logging in. Typical examples of such repositories
are file servers. Each file in these servers inherits explicit read and write rights for
individuals, as well as defined groups. By logging in, users will be authenticated and
through this authentication users' rights including information about group member-
ships can be obtained. This credential information predefines and limits which data
or files a user can access. Obviously, a search engine that accesses these repositories
has to consider these permissions to avoid security leakage.
Search WWH ::




Custom Search