Accessing Administrative Environmental Information

INTRODUCTION

With the citizens being entitled to be provided with environmental information, the quantity increased, as did the efforts needed to find the desired information on the many distributed Web sites. The Environmental Information Networks (EIN) of Baden-Wuerttemberg and Saxony-Anhalt, presented here, shall serve as a central access platform that facilitates search by offering a thematically structured approach and various search options to the user. They both are instances of a pragmatic approach to the construction of environmental portals for the public.

BACKGROUND

In Germany, supply of environmental information belongs to the obligations of public administration. According to the Environmental Information Act (Umweltinformationsgesetz, 2005), citizens are entitled to have access to the environ mental information available at an authority dealing with environmental tasks. For this reason, many authorities provide the corresponding information in the World Wide Web (WWW). Often, these developments have been made by the individual authorities in their own responsibility and are not embedded in a larger context. The Environmental Information Act makes an active supply of information obligatory for public authorities. This further enhances the role of the Internet as a means for the active provision of environmental information.

Many of the authorities’ Web sites offer rudimentary search and navigation helps to the data only. Frequently, a full-text search is not available and no metadata, for example, keywords, are added to the usable contents. Links to related offers are often lacking completely or to a large extent. The contents have often been processed in line with the authority’s organizational structure, but not according to criteria that seem logical to the user.

For the citizen, this means that the information searched for is often found only when the authorities and their structure are known in detail. Even Internet search engines are not very helpful, as the large number of hits hides the information searched for.

Figure 1. System architecture of the EIN: Components and interfaces

System architecture of the EIN: Components and interfaces

With the German Environmental Information Network project gein® (http://www.gein.de) in 2000, an attempt was made to establish an environmental information portal on the federal level, which offers search functions for information provided by federal and state authorities. Based on the model of gein®, such an environmental portal also is considered a promising approach on the regional level for the states of Baden-Wuerttemberg (http://www.umwelt. baden-wuerttemberg.de) and Saxony-Anhalt (http://www. umwelt.sachsen-anhalt.de).

A first inventory of environmental information offers showed more than 100 relevant sites on the internet for the state of Baden-Wuerttemberg (2003) and more than 130 for the state of Saxony-Anhalt (2005). Due to the large number of sites from the municipal sector these have not been taken into account in the first step.

BASIC CONCEPTS

The development of this portal to an environmental information network is aimed at improving the networking of the distributed environmentally-relevant Web offers of the states of Baden-Wuerttemberg and Saxony-Anhalt (Schlachter, 2004a). The users from these states shall be given comfortable access from a central point.

Metadata on all information offers are compiled centrally by an editorial staff. This database represents the starting point for the operation of the portal. The data are stored persistently using a content management system (CMS) and updated in this system via a WWW interface. The CMS provides interfaces to other components, for example, full-text search and automatic keyword search. Moreover, its templates allow for the presentation of the data, the layout, and the generation of navigation.

A major prerequisite for the operation of such a portal is that no, or only a minimum, expenditure is needed for the maintenance of the Web sites referenced therein. Thus, the portal meets with the acceptance of the Web site operators.

Although the expenditure required for the integration of information offers in the EIN shall be minimized, individual interfaces have to be generated for certain information systems. This especially applies to offers that are generated dynamically, as they, for example, query statistical data or measurement values from databases. Moreover, the expenditure needed for the development of these interfaces shall be minimized.

The users of EIN are offered several access paths to the individual information offers, in particular a thematic access, full-text search, keyword search, and other specialized access options.

According to the requirements outlined in the Act on Equal Opportunities of Handicapped Persons (Gesetz zur Gleichstellung behinderter Menschen, 2002), the entire presentation is tailored to barrier-free access, that is, the structure of the contents offered is implemented largely semantically in HTML, while the layout is described by means of style sheets.

SYSTEM ARCHITECTURE AND COMPONENTS

The EIN system architecture consists of various individual components. Use of a maximum of standard components is envisaged. These components include:

• Central component, including

• Central data storage

• Maintenance interface for administrators and editorial staff

• Presentation component

• Data interface for external components

• Search engine for full-text search

• Search engine for keyword search

• Web server log file analysis and statistics tool

• Quality assurance tools

The central component of the architecture is implemented using a CMS. With its back-end database, it provides for the persistent storage of data and offers interfaces for administrators, editorial staff, and the users of EIN. In addition, the CMS supplies the necessary data to other components.

CONTENT MANAGEMENT SYSTEM (CMS)

The metadata are managed by a content management system (CMS). Thus, many functional parts of such a system can be used for the EIN. The most important function is the storage of the necessary data in the CMS or its back-end database. The maintenance interface for administrators and editorial staff may be mapped easily via the workflow support and the CMS authorization system.

The data interface for external components is implemented by making use of the programmability and extensibility of the CMS. Presentation of the contents as well as the automatic generation of navigation and menus for the portal may be accomplished using the template-based presentation mechanism of the CMS. This technology also facilitates the implementation of a barrier-free Web presentation (Chaves, 2003).

A major advantage of the CMS used is its capability of using ontology to link the data contained. All metadata are modelled as concepts and relations connecting these. Based on ontology, navigation, and search facilities could have been designed and implemented easily.

Figure 2. A thematic issue selected in the EIN of Sachsen-Anhalt

 A thematic issue selected in the EIN of Sachsen-Anhalt

THEMATIC ACCESS AND NAVIGATION

By a thematic approach, the variety of offers is limited by consistent grouping to certain environmental issues selected, such as soil, water, nature protection, etc. Experience gained from the use of the gein® portal shows that a lean and flat structure complies most with the wishes of the user.

Administration of the environmental issues also takes place by means of the CMS. The environmental issues are defined as an own-object class and may be linked with other contents of the CMS via relations. On the basis of these data, the corresponding templates generate corresponding menus and, thus, navigation in the EIN.

Each of the offers integrated in the EIN is assigned to a few environmental issues of high priority by relations. Moreover, it may also be assigned to other issues of lower priority. This assignment is then reflected by the order of presentation of a certain environmental issue. For later extensions, these issues may be refined.

FULL-TEXT SEARCH

A full-text search makes accessible all Web sites connected to the EIN. Via a corresponding data interface, it uses the metadata stored in the content management system, and the data are also made available to other components and possible extensions.

Indexing of the individual Web sites takes place by means of a crawler that indexes complete Web sites in a fully automatic manner based on the references contained.

Figure 3. User interface of the EIN of Baden-Wuerttemberg

User interface of the EIN of Baden-Wuerttemberg

For individual Web sites, adaptations to this type of full-text indexing have to be made and the respective interfaces have to be generated.

The full-text search allows searching either all or only a part of the indexed Web sites. The user has the possibility of limiting full-text searches to such Web sites that are assigned to one or several of the aforementioned environmental issues. This thematic limitation of search means a major progress as compared to conventional search of all Web sites, which may result in a number of irrelevant hits.

Full-text search also is available for single Web sites and for use in other environmental information systems. In the near future, full-text search will make its results available in the OpenSearch format (http://opensearch.a9.com).

SEMANTIC NETWORK SERVICES (SNS)

The contents of the Web offers integrated in the EIN are not opened up by full-text search alone, but also via a keyword search. The semantic network services (Bandholtz, 2003) developed on behalf of the Federal Environmental Authority (Umweltbundesamt), and used in the gein® portal, offer a fully automatic keywording of WWW sites under semantic integration of an environmental thesaurus, geographical names, and chronology. Ambiguities are resolved by a context analysis. Keywords are weighed with respect to their significance to a special document.

The EIN uses the Web service interface of the semantic network services for indexing of all documents contained. Relations between certain keywords and documents, as well as metadata, are stored in a special database.

OTHER ACCESS OPTIONS FOR USERS

Some environmental information items are updated regularly, partly at very short intervals. Among them are current air and radiation measurement values, flood forecasts, and water levels. In a special area, the user is granted access to these frequently requested information items.

Via another access, the offers integrated last in the EIN and containing major novelties are presented to the users.

In addition, a list of suppliers of environmental information is provided, such that the data canbe accessed via names of authorities or institutions. To generate this information, the data collected in the database are used.

INCORPORATION OF SPECIAL OFFERS

Many, but unfortunately not all, offers can be integrated in the EIN without any further adaptation. This especially applies to systems that entirely or partly consist of dynamically generated sites that can be reached via form-based queries only.

An example of such an offer is the Web site of the Statistical State Office of Baden-Wuerttemberg (http://www. statistik-bw.de), which largely consists of dynamically generated tables that are transferred to the user following a selection via a form. A second example is the Web site “Environmental Data and Maps Online” (http://brsweb. lubw.baden-wuerttemberg.de/), which provides access to a large number of current and historic values measured, e.g., radioactive radiation or emission data.

For such Web sites, the corresponding interfaces or adaptations have to be generated. In the case of the Statistical State Office, an additional, automatically generated site was established apart from the already existing Web offer. This site is used as a starting point for the crawler of full-text search and offers links to all relevant sub-pages.

QUALITY ASSURANCE

To reach a maximum quality of the information offered, quality assurance tools are available in the EIN. In particular, availability of the Web sites integrated in the EIN is checked at regular intervals. Furthermore, the administrators and editorial staff are informed automatically about larger structural or contents-related modifications on Web sites and, if necessary, may interfere with failed indexings of the Web sites or adapt changed URIs.

A great challenge of quality assurance is the detection of redundant contents and information fragments delivered by content management systems.

IMPLEMENTATION

Prototype development in 2003, to demonstrate basic functioning, was followed by the development of a first productive system for the state of Baden-Wuerttemberg in 2004. In January, 2006, a second instance for the state of Saxony-Anhalt went online.

The present implementation is based on a CMS with a back-end database. The software WebGenesis (http://www. Webgenesis.de) developed by the Fraunhofer Institut fur Informations- und Datenverarbeitung (IITB, Fraunhofer Institute for Information and Data Processing) and a MySQL database are applied.

Both in the prototype and the productive version, the Open-Source search machine ht://Dig (http://www.htdig. org) is employed for implementing full-text search. Due to its variety of configuration options, it guarantees sufficient flexibility for indexing Web sites and the search functions required. Configuration files for the full-text search machine are generated regularly via the data interface of the CMS.

FUTURE TRENDS

In the near future, the full-text search engine will be replaced by a more efficient and more flexible system. For this purpose, alternative products are being examined at the moment, among others, the open-source frameworks Lucene/Nutch (http://lucene.apache.org). It is also planned to extend the features of the keyword search. In this way navigation on the environmental thesaurus shall be made possible for the user. From a given environmental issue as a starting point, the user shall be able to navigate along the hierarchy and associations to find the documents desired. For the comfort of the user, personalization of the portal shall be integrated in future versions.

While the idea of an Environmental Markup Language (EML) (Arndt, 2000) was not well accepted by the environmental community, the concept of a Semantic Web (Berners-Lee, 2001) has the potential to meet its needs and is currently being established as a world-wide standard. Consequently, environmental applications and data sources have to be enabled to generate such machine-readable data. Thus, the EIN has to improve its search facilities by incorporating a semantic search. The semantic Web will make more information accessible to the user and at the same time expenses can be reduced, since the programmer will be spared of the task to implement individual interfaces for each application.

conclusion

With the EIN, distributed information is offered by the environmental administrations of the states of Baden-Wuerttemberg and Saxony-Anhalt to the user in a transparent and clear manner.

Following commissioning of the EIN, the expenditure required mainly consists of the administration of metadata and integration of new information offers. For this, the EIN makes available an interface to the user, via which proposals can be made for the integration of further contents and information on modifications of existing sites can be transmitted.

Closer co-operation with the environmental portal Por-talU, the upcoming successor of gein®, and an intensified common use of components are aimed at.

KEY TERMS

Content Management System (CMS): A computer software system for organizing and facilitating the collaborative creation of documents and other contents. In this article, a content management system is a Web application used for managing Web sites, Web contents, and metadata.

Environmental Informatics: Research and systems development focusing on environmental sciences in terms of the creation, collection, storage, processing, modeling, interpretation, display, and dissemination of data and information.

Environmental Information: Information on the state of the elements of the environment, such as air, water, soil, land, biological diversity, genetically modified organisms, and the interaction among these elements; factors, such as substances, energy, noise, radiation or waste, emissions, discharges, and other releases into the environment; measures concerning or affecting the environment; reports on the implementation of environmental legislation; and the state of human health and safety.

Full-Text Search: The search engine examines all words in every document stored as it tries to match search words supplied by the user. The most common approach to full-text search is to generate a complete index or concordance for all searchable documents. For each word an entry is made, which lists the exact position of every occurrence of it within the database of documents. From such a list, it is relatively simple to retrieve all the documents that match a query, without having to scan each document.

Keyword Search: Keywords are words that relate to a particular topic. They need not necessarily occur in the full text of a document. Keywords may be provided as meta information within the document or can be created additionally by editorial staff or an automatic keyword generator. Given or generated keywords are stored in a database and can be searched for by the user.

Ontology: In computer science, an ontology is a data model and a form of knowledge representation that represents a domain of the outside world and is used to map the objects in that domain and the relations between them.

Semantic Web: An extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

Web Portal: Sites on the World Wide Web that typically provide personalized capabilities to their visitors. They are designed to use distributed applications, different numbers and types of middleware and hardware to provide services from a number of different sources.

Next post:

Previous post: