Interoperability in Geospatial Information Systems

INTRODUCTION

Geospatial information systems (GIS) are an important sector of the information industry, as well as an essential component of the information technology infrastructure (Lo & Yeung, 2002). They are a type of computerized information system specifically designed and used to solve geospatial problems, those which are related to locations on the surface of the earth (Longley, Goodchild, Maguire, & Rhind, 2001). The extent of usefulness of GIS has been proven across many diverse applications in many disciplines. They have long been used in traditional application settings, such as land management and natural resources, and have recently become an important element in emerging applications, for example, in ubiquitous mobile computing environments.
Since the mid-1990s, the focus of computing has shifted from stand-alone and locally networked environments to wide-scale, distributed, heterogeneous computing infrastructures. This coupled with the exponential growth in the use of the Internet has enabled and compelled new ways of using GIS. A wide range of GIS applications are now widely available on the Internet for users from anywhere in the world. In addition, the proliferation of wireless and mobile computing technologies, such as cellular phones and Personal Digital Assistants (PDA), has provided new platforms and paved the way for the emergence of new GIS applications. Because of the advances in computing, GIS applications are now designed, implemented, and applied very differently from their predecessors.
Contrary to their past monolithic design and implementation, GIS are now becoming an integral component of many diverse software packages specifically designed for solving problems in different application domains. In addition, current computing trends suggest that future GIS will be multitiered and used in heterogeneous network environments, where computers of different platforms coexist and tasks are performed in a distributed manner. Consequently, future GIS will have to be interoperable— they will have to be able to work together in a seamless fashion.

BACKGROUND

Efficient and effective use of GIS to solve geospatial problems generally requires special skills. Today’s GIS platforms are mostly designed for workstations or personal computers and provide generic “toolbox” geoprocessing operations that can be broadly applied to many problems in different application domains. To utilize these packages, the users must possess certain knowledge and skills. First, they must have knowledge about how real-world objects are represented in GIS; for example, the boundary of a county is represented as a set of points that defines a polygon, and a railroad is represented as a line object. Second, the users must know the range of available geoprocessing operations in GIS and how they are applied to solve geospatial problems. For example, the users must know that in order to determine if a railroad crosses a county boundary, a geometric intersection operation using a polygon and a line that define the county boundary and the railroad, respectively, should be applied. In addition, knowledge about geospatial data sources, geospatial data storage, and methods of obtaining geospatial data are also needed. To solve the problem of county boundaries, the users must have knowledge about the sources of data sets for polygons representing county boundaries. They must also know the format and structure in which the data sets are stored. Lastly, the users must know how a GIS software package operates and how to use it to solve problems. For example, they must know how to operate the ArcInfo GIS software package and be familiar with the methodology in ArcInfo for incorporating data sets into the project, including format conversion, coordinate transformation, and importing procedures. They must also know the specific commands and syntax for invoking geoprocessing operations in ArcInfo as well as the specific behavior of each operation (e.g., Does the “intersect” operation provided by ArcInfo partially, or fully, solve the problem?).
These difficulties in using GIS are due to a number of historical and practical reasons. GIS software packages generally approach geospatial problems in terms of abstract geometrical objects and operations, that is, computations on points, lines, and polygons. This imposes a heavy burden on the users because the first task in solving any problem with GIS is to map real-world problems into an environment where GIS techniques and tools can be used. This task is further complicated by the fact that problems in different application domains are often treated differently in GIS. For example, applications in ecology usually involve large-scale raster data and spa-tiotemporal analysis for visualization purposes, while applications in urban navigation generally involve a smaller geographic extent and are concerned mainly with real-time decision information. In addition, GIS software packages have historically been developed independently with little regard to data sharing (Goodchild, Egenhofer, & Fegeas, 1998). Different GIS software packages use their own proprietary formats, schemas, and terminologies to represent geospatial data and concepts. This has exacerbated the issue of data use, especially when they are to be shared, requiring manual conversions or availability of import and export tools. This process is often nontrivial and, considering the large volumes of data commonly required in GIS projects, is also very time consuming.

MAIN THRUST OF THE ARTICLE

Information Heterogeneity

The aforementioned issues are related to interoperability. The basis for problems related to interoperability is information heterogeneity, which is divided into three levels (Sheth, 1999): syntactic heterogeneity, which refers to the differences in formats and data types; structural heterogeneity, which deals with the differences in data-modeling constructs and schemas; and semantic heterogeneity, which refers to the variations of the intended meanings of concepts and terminologies. Table 1 provides examples of information heterogeneity in GIS.
The issues of syntactic and structural heterogeneities have been extensively addressed in the past within the computer and information science discipline. Recently, much research has been focused on addressing the issue of semantic heterogeneity, which is a significant problem in the field of GIS. In general, semantic heterogeneity is a result of different conceptualizations and representations of things in the world and can be distinguished into two types (Bishr, 1998).
Cognitive heterogeneity, which arises when two groups of people from different disciplines conceptualize the same real-world facts differently. As an example, a geologist thinks of hill slopes as areas where soil erosion or landslides can occur, but a tourist manager may think of hill slopes as areas where skiing is possible (Dehn, Gartner, & Kikau, 1999).
Naming heterogeneity, which arises when different names are used for identical concepts of real-world facts. For example, hill slope is also known as valley side, mountain flank, or simply slope.
Due to the widespread use of GIS by users both within and across disciplines, semantic heterogeneity in GIS is increasingly becoming an important issue in the GIS community. In the first example illustrated by Lutz, Riedemann, and Probst (2003), the semantic of the touch topological operator in the GeoMedia Professional GIS software package is different from that of Oracle 9i Release 2 Spatial (Table 2.). In GeoMedia, two polygons would satisfy the touch operator if their boundaries and/or interiors intersect. In Oracle, on the other hand, two polygons would satisfy the touch operator only if their boundaries, and not their interiors, intersect.
Furthermore, two GIS software packages may use different names for the same spatial operation. For example, the operation for aggregating polygons based on an attribute is called dissolve in the ArcGIS software package, but may be known by others as a merge operation (Figure 1).

Table 1. Information heterogeneity in GIS

Information Heterogeneity	Examples
Semantic	Different behaviors of the “intersect” operation from different GIS software packages Different interpretations of the word “within” in a user’s query
Structural	Different data dictionaries when merging two or more data sets Different metadata standards
Syntactic	Different data formats (e.g., Shapefile, ASCII [American Standard Code for Information Interchange], XML [extensible Markup Language])

Table 2. Topological relationships between two polygons and whether they satisfy the “touch” operator invoked in two different GIS software packages


GeoMedia Professional GIS	YES	YES
Oracle 9i Release 2 Spatial	YES	NO

These occurrences of semantic heterogeneity can lead to confusion and unexpected outcomes for users who need to deal with multiple GIS platforms or interact with other users who use different GIS platforms. The reconciliation of the differences in semantics must be accomplished by all parties involved in order for them to interoperate.
In another example of semantic heterogeneity, a German motorist in 1998 drove his car into a river after following instructions given by its navigation system. Though there may be other factors that led to the accident, Raubal and Kuhn (2004) hypothesize that the technical factor was that the in-car navigation computer did not make the distinction between a bridge, which is a permanent pathway, and a ferry, which is a transport carrying cars across a river. Though both a bridge and a ferry are pathways that can be used for route computations, a crucial semantic distinction must be made between them when instructions are given to drivers to account for the nonpermanent nature of the ferry.
To overcome information heterogeneity in GIS and provide interoperability in GIS platforms, users, and data, there needs to be an agreement among the parties involved.

The Open GIS Consortium Standards

Many standards have been defined for the GIS domain to allow interoperability among different GIS platforms. However, currently the most prominent standard body for GIS is the Open Geospatial Consortium (http:// www.opengeospatial.org). OGC is an organization consisting of companies, government agencies, and universities participating in a consensus process to develop publicly available geoprocessing specifications that result in interoperability among diverse GIS platforms. The creation of the OGC was to address interoperability issues among different GIS platforms, particularly when they are used in the Internet environment. The development process of the OGC involves the creation of abstract specifications and implementation specifications.
The purpose of abstract specifications is to create the conceptual foundation that facilitates understanding of real-world geospatial phenomena, and allow for the development of implementation specifications by precisely capturing and stating requirements and knowledge of the abstract geospatial domain. Essentially, abstract specifications describe how “ideal” software should work, and they include, but are not limited to, topics on feature geometry, topology, coordinate reference systems, and geospatial metadata. Abstract specifications mainly concern abstract geospatial objects and concepts applicable to GIS (e.g., point, line, and polygon) and do not specifically address real-world, application-context concepts and terminologies (e.g., street, river, forest). For example, the topic Feature Geometry, which is also a draft international standard (ISO 19107 Spatial Schema; Herring, 2001), is the standard that specifies geometrical and topological objects as well as operations which can be applied on them.

Figure 1. Operation for aggregating areas based on an attribute may be called differently in different GIS software packages

Implementation specifications provide programmers with specific programming rules and advice for implementing interfaces and protocols that enable interoperability between different GIS platforms. They are engineering specifications that implement part of the abstract specification for particular distributed computing platforms. For example, the Web Map Service (WMS) Implementation Specification specifies the interface for providing mapping services over the Web (Beaujardiere, 2002). Another example, the Geography Markup Language (GML), is a language designed to be a general data format for modeling, transporting, and storing geospatial information (Cox, Daisey, Lake, Portele, & Whiteside, 2003). GML is an XML (extensible Markup Language) grammar written in XML schema that provides a variety of kinds of objects for describing geography as defined in abstract specifications, including features, coordinate reference systems, geometry, topology, time, and units of measurement.
Although the OGC standards address many aspects of information heterogeneity in geoprocessing, they are designed by GIS experts for use by GIS experts in implementing GIS projects. Missing from the OGC standards, however, is the issue of how to allow non-expert users to realize the potential of GIS by making the task of geospatial problem solving easier through semantic interoperability in the application-domain context.

Ontological-Based GIS

There are two key aspects related to semantic interoperability in GIS. First, there is a need for semantic agreement on geospatial data models (e.g., point, line, polygon) and geoprocessing operations (e.g., buffering, intersection). Second, there is a need for semantic agreement about real-world, application-level geographic objects, concepts, and terminologies used in geospatial problem solving. The OGC standards address the first aspect of semantic interoperability by providing a uniform definition and behavior of abstract geometrical objects and geoprocessing operations. However, it does not address the second aspect of semantic agreement regarding the geographic world and application-domain contexts. As previously discussed, one difficulty in using GIS is the mapping of real-world problems into a form which GIS understand. This is arguably the first and most important task of problem solving in GIS, and current technology does not have a means to automate it.
An approach to address the problem is by incorporating ontologies into GIS, which would provide shared bodies of semantic knowledge of the geospatial domain. An ontology is a specification of a conceptualization (Gruber, 1993) that allows parties who agreed to an onto-logical commitment to communicate with one another and share knowledge. It may include a dictionary of terms and a specification of their intended meanings. The concepts defined in an ontology and how they are interrelated collectively impose a structure on the domain and constrain the possible interpretations of terms (Uschold & Jasper, 1999). In the information-system context, ontologies are machine-processable bodies of knowledge. As such, an ontological-based GIS would allow the use of geospatial information based primarily on its meaning (Fonseca, Egenhofer, Agouris, & Camara, 2002).
In an ontological-based GIS, an ontology would include concepts and terminology about an application domain that users can directly relate to and use. For example, an ontology would define terms specific to the application domain of ecology and how they can be interpreted. This ontology can then be used by the users to formulate their geospatial queries that would conform to the knowledge defined in the ontology. Furthermore, the OGC geoprocessing standards can be considered as another distinct ontology that constrains the meaning and behavior of geometrical objects and operations. Since solving geospatial problems using GIS involves mapping real-world queries into geometries and operations, bridging the two ontologies would provide the means for interpretation of geospatial queries by computers.
Recent research efforts on geospatial ontology include cognitive and philosophical aspects on how the real world should be modeled and formalized into ontologies (Mark, Freksa, Hirtle, Lloyd, & Tversky, 1999; Mark,Smith, & Tversky, 1999; Smith & Mark, 1998, 2001), as well as how to use ontologies in GIS (Fonseca et al., 2002; Karimi, Akinci, Boukamp, & Peachavanish, 2003; Kuhn,2001; Raubal & Kuhn, 2004; Visser, Stuckenschmidt,Schuster, & Vogele, 2002).

FUTURE TRENDS

One of the goals of GIS research, explicitly stated or not, is to advance the technology to a point where it can be used as decision-support systems assisting users in solving a wide variety of problems in many applications. We consider GIS to be decision-support systems when they are easy to use by all users with different backgrounds, able to solve complex problems that otherwise are handled inefficiently, semantically interoperable, and equipped with knowledge and reasoning to provide automated decision-making tasks, especially in real-time applications.
This need is evident by the proliferation of application-specific GIS on the Internet (e.g., Web sites that provide driving directions) and in other distributed environments (e.g., location-based wireless real-time services). To thrive in these heterogeneous, distributed environments, GIS platforms must support interoperability. Additionally, much research is still needed to unlock the potential of GIS to ordinary users for solving complex problems. For instance, semantic integration into GIS through ontologies is a research area that would facilitate interoperability at a higher level, allowing the use of GIS by many users with little GIS background, lowering geoprocessing costs, and increasing the usefulness of GIS in general.

CONCLUSION

GIS have come a long way from being simple stand-alone tools that facilitated digital mapping and primitive geoprocessing to information systems capable of performing sophisticated geoprocessing on stand-alone or distributed platforms. This evolution took over 4 decades and was made possible through advances in computer geometry, database systems, personal computers, Internet, and other techniques and technologies. During the same period, the number of applications that adopted GIS technology increased. Today, numerous applications utilize GIS technology to solve a range of simple to complex problems. However, despite the complex operations current GIS support and the widespread applications in which they are employed, they markedly lack the ability to interoperate due to various historical and practical reasons. Advances in key areas are needed before GIS become more interoperable and accessible to all users, novice or expert, paving the way for the emergence of new applications.

KEY TERMS

Geoprocessing: Operations in GIS for integrating, analyzing, computing, and presenting geospatial data.
Geospatial Data: Data representing objects on or near the surface of the earth.
Geospatial Information Systems (GIS): Information systems capable of storing, managing, computing, and displaying geospatial data for solving geospatial problems.
Geospatial Problems: Problems involving geospatial data, objects, and phenomena.
Information Heterogeneity: The differences in syntax, structure, and semantics used in different information systems.
Interoperability: The ability of two or more heterogeneous systems to work together in a seamless manner.
Ontology: A conceptualization and representation of objects and phenomena and the relationships among them in a domain.
Standard: An agreed-upon set of concepts, terminologies, and methodologies by a given community.