Managing Uncertainty in a Geospatial Model of Biodiversity Part 1

Introduction

The boreal forest spans the northern hemisphere through Canada, Russia and Alaska and plays an important role in the societies which depend on it. In addition to its vital role in Earth-atmosphere interactions, the boreal forest is of great economic importance as a renewable natural resource. It provides valuable and numerous pulp, paper and lumber products that are used by all facets of society. It is in our interest to sustain this resource and it is, therefore, the subject of much scientific research geared toward understanding the mechanisms and processes of which it is a part.

Fire has always played an important role in the proper functioning of the boreal forest. The serotinous pine cones of the tree species, for instance, will not open and release their seeds without exposure to extremely high temperatures and, therefore, rely on fire for regeneration (Cameron, 1953; Johnson, 1992). Although the boreal forest is a dynamic ecosystem with continual disturbance by fire, it has been capable of maintaining itself in a relatively stable state. With this in mind, harvesting practices seek to mimic this form of natural disturbance. However, only by understanding the processes that govern natural mortality and regeneration, can this goal be realized. Biological diversity (or biodiversity) is one important aspect of this research and it is the subject of this study. The biodiversity of the boreal forest is not a dynamic process itself, rather, it is the product of many processes. These include internal processes operating within the ecosystem as well as external processes operating on the ecosystem.


Currently, ecologists are able to measure and map biodiversity of flora in the boreal region by collecting various data on the vegetation in forested stands from ground-based surveys. Although this method is by far the most accurate, it is a very costly, time-consuming and labour intensive process. From these ground observations, ecologists can also study and characterize the processes operating in the boreal forest. If the underlying processes which shape biodiversity can be determined and parametrized in a numerical model, they can potentially be used to predict biodiversity. However, whether biodiversity is estimated directly from the ground data or predicted with numerical models from ground-measured input variables, only stand-based estimates of diversity are derived. Our understanding of biodiversity over the landscape is, therefore, limited by these sparse data. To increase our understanding, there is a need to map diversity over an entire landscape.

Biodiversity

In the broadest sense, biological diversity is natural variation. This natural variation can occur at the level of the molecule, gene and species. At these levels, biological diversity can be described across many scales such as the forest stand or over landscape units such as a drainage basin (Huston, 1994).

Ecologists generally use two components to describe biological diversity or biodiversity. The first is species richness (S) which is simply a count of the number of different species present in a given area. The second is known as evenness which is also referred to as species relative abundance and describes the proportions of species in the area. A natural ecosystem is usually composed of a few species that are very abundant within an area and many species which are much less abundant. A measure of evenness accounts for this aspect of diversity (May, 1975; Magurran, 1988). Both components are accepted representations of diversity and we have chosen to use richness in this research. Having said that, we should mention that relying strictly on a measure of richness to describe diversity is potentially dangerous because it does not describe the distribution of species in the area.

In the context of forests, it is important to note that trees usually contribute a greater quantity of usable natural resources to a site than herbaceous plants. High diversity areas, therefore, are not necessarily beneficial. For example, shortly after an area has been harvested, herbaceous plants and shrubs that were once starved for light and competing for nutrients and moisture with the mature canopy above would be free to grow. In addition, invasive species that have migrated in from nearby areas will be able to grow. Within a short time span (which could be years or months), the number of species in the area would be very high but not necessarily beneficial to the organisms that use the area (Harper and Hawksworth, 1995; Bormann and Likens, 1979). For example, Andow (1991) reviewed the literature on arthropods in monocultures versus polycultures and found that increases in vegetation diversity adversely affect herbivorous arthropod populations.

Objectives

The overall objective of our ongoing research was to understand the processes governing the spatial pattern of herbaceous plants on forested landscapes. One possible dimension of this pattern is biodiversity and we are currently developing methods for mapping biodiversity. This work involves a multidisciplinary team of ecologists, mathematicians, geographers and engineers. The research aims to reveal the landscape characteristics that are controlling diversity and development of empirical models both for estimating these characteristics from geospatial data such as remotely sensed imagery and digital elevation models and to estimate the diversity itself using predictive vegetation mapping systems. An integral component of this research is the estimation of the uncertainty in both the landscape characteristics and the biodiversity.

The model used currently to map biodiversity is a simple multivariate linear regression that is largely intended to explore the importance of the four controlling landscape variables (Chipman and Johnson, 2002). This model has been implemented at the landscape level by estimating (mapping) the input variables using geospatial data (Warren, 1999). The objective of the work reported in this topic is to present methods for estimating the uncertainty in landscape variables and for propagating this uncertainty through the biodiversity model and produce uncertainties in the final map. In particular the work aims to answer three questions:

1. How well can richness be estimated using estimated versus observed landscape variables? In other words, to what extent do the errors in the estimated landscape variables influence the richness predictions?

2. How accurate are richness predictions relative to field observations?

3. What is the level of uncertainty in richness predictions?

2 Study Area and Data

The study site is part of the boreal mixed-wood forest located in and around Prince Albert National Park (PANP), central Saskatchewan (53° 35′ N to 53° 20′ N and from 106° 47′ W). PANP is located on the southern fringe of the boreal forest which gives way to an expansive agricultural region to the south (Figure 11.1). The following physical description of the area has been summarized from Bridge (1997).

The geomorphology of the area has been defined primarily by the glacial events of the Pleistocene (~ 12 000 years ago). Although glacial tills dominate the area, there are also substantial organic glaciofluvial (of glacial river origin) and glacio-lacustrine (of glacial lake origin) deposits. The topography is composed of rolling hills with an elevation range of 500 to 800 m above sea level. Long cold winters and short cool summers are characteristic of the regional climate. Between 400 and 500 mm of precipitation is received by the area with approximately 70 % falling as rain.

There are two major disturbance regimes at work in the area: forest fires and forest harvesting. The first includes both human induced and natural lightning caused fires although the latter make up the majority. Weyerhaeuser Canada operates a Forest Management License Area (FMLA) in the region and is responsible for the regeneration of harvested areas. However, within PANP there is no harvesting and the only major source of natural disturbance is fire.

 Location of the study area. The boreal region is shown by the grey shaded areas with the southern mixed-wood boreal forest delineated by the darker grey belt

Figure 11.1 Location of the study area. The boreal region is shown by the grey shaded areas with the southern mixed-wood boreal forest delineated by the darker grey belt

Eight tree species dominate the area which comprise both coniferous and deciduous species. The coniferous species are Picea glauca (Moench) Voss, Picea mariana (Mill.) B.S.P., Pinus banksiana Lamb., Abies balsamea (L.) Mill., Larix laricina (DuRoi) K. Kock. Populus tremuloides Michx., Populus balsamifera L. and Betula papyrifera Marsh. make up the deciduous species. There are also many herbaceous shrubs and ground cover plants that contribute to the vegetation composition.

Stand data

More than 150 stands were sampled over a period of three years. Stand data collected in the field included species counts of canopy and sapling trees, shrubs and herbaceous plants as well as stand density. These data were used to calculate species richness for each stand. In addition, estimates of the age of each stand were made from tree cores. Details of the stand-based field sampling methods maybe found in Warren (1999) and Chipman and Johnson (2002). For each stand the distance to the nearest ridgeline from topographic maps was estimated. Landscape properties which were measured in the field are referred to in this topic as observed. These included richness, canopy stem density, canopy species type, distance to ridgeline and canopy age.

Geospatial data

Vector data representing elevation contours, rivers and lakes were made available by Parks Canada for PANP and surrounding adjacent lands. These were used to interpolate an elevation model (DEM) of the area using the ArcInfo software. The contour interval of the topographic vectors was 8 m. These vector GIS files were not provided with any metadata so that errors of unknown type and quantity existed within these files. More details on the interpolation and processing of the DEM may be found in Warren (1999).

Landsat Thematic Mapper (TM) imagery of the PANP acquired on 10 June 1996 was used. The image was virtually cloud free and included all seven spectral bands with a 30 m spatial resolution (120 m for channel 6). The imagery was referenced to the WGS-84 ellipsoid in a UTM projection system. This processing resulted in resampling the image pixels to 25m ground sample spacing to match the DEM.

In addition to electro-optical data, SIR-C (Shuttle Imaging Radar) polarimetric synthetic aperture radar (SAR) image data were obtained from the Jet Propulsion Laboratory (JPL). These data were acquired on 4 and 6 October 1994. Software supplied by the JPL (NASA Jet Propulsion Laboratory, 1993, 1994; Chapman, 1995; Vuu et al., 1995) was used to strip the raw data from the tape, synthesize the resulting Stokes matrix for each pixel into linear polarization images (HH,HV, VH and VV) and finallyto perform multi-looking (theeffectivenumber oflookswas seven). Theground resolution for both images was 12.5 m. Together, the two images cover a substantial portion of the PANP study site.

The original SAR imagery was filtered using a gamma-gamma filter to remove some of the speckle noise (Lopes et al., 1993). Close inspection of the imagery prior to filtering revealed great local variation in backscatter amplitude. This was especially apparent in the L-band imagery. After filtering, this variation had been reduced. Using approximately 40 ground control points for each SAR image, the images were transformed into the WGS-84 ellipsoid with an RMS error of approximately one pixel (25 m).

A raster map of time-since-fire was obtained from the University of Calgary, Biological Sciences Department, Ecology Division. This map was derived from data records acquired and maintained by Weir (1996).

Methods

Biodiversity model

Chipman and Johnson (2002) developed an ecological model which is able to predict biological diversity of herbaceous plants in boreal forest (R2 = 0.52). The variables of interest are (1) the distance of a stand from the drainage basin ridgeline, (2) the time since the last fire (i.e. forest age), (3) the canopy species type and (4) the density of canopy species. These four variables are surrogate variables which represent underlying ecological processes which relate to the use of plant resources. Specifically, the resources of interest are moisture, nutrients and light. The model was founded on theories which relate the availability and use of these resources to biodiversity. The ecological significance of these four variables is discussed in Warren (1999) and Chipman and Johnson (2002).

The general form of the species richness prediction equation developed by Chip-man and Johnson (2002) is:

tmp255-364

wheretmp255-365is the intercept,tmp255-366are parameter estimates for the variables time-since-fire (TSF), distance-from-ridgeline (DFR) and canopy-stem-density (CSD) respectively, and are parameter estimates for the categorical dummy variables jack pine (JP), black spruce (BS) and trembling aspen (TA) respectively. The categorical forest type variables are binary. Only one of these variables may have a value of one at any one time while all others must have a zero value. The mixed class is not included in the equation but is represented when all other species have a value of 0. The effect of these binary variables is to change the intercept of the model. The values of the model parameters were estimated from field observations of the four landscape characteristics and richness.

To estimate richness at the landscape scale, the four landscape characteristics were estimated from geospatial data and the model run for each pixel. A schematic diagram describing the overall system for producing a richness map is given in Figure 11.2 and further details may be found in Warren (1999). In this section, the methods for estimating the uncertainty in the estimated landscape characteristics and for propagating this uncertainty through the model are discussed.

Time-since-fire

The time-since-fire variable (TSF) was used as a surrogate for the manipulation of resources as a result of a disturbance process. Weir (1996) produced a 1:50 000 scale map of 1249 (TSF) polygons for the PANP based on a combination of aerial photograph interpretation and field reconnaissance. He also estimated the uncertainty for polygon fire ages and these are summarized in Table 11.1. Where a range of uncertainty was specified, the upper limit of the range was used in order to be conservative.

In addition to the measurement uncertainty associated with the fire ages, there was also spatial uncertainty in the location of the polygon boundaries. This spatial uncertainty manifested itself in the form of additional uncertainty in the ages of the forests in the polygons.

Table 11.1 Expert opinion estimates of time-since-fire measurement uncertainty

Fire age (years)

Estimated uncertainty (years)

0-149

± 1

150-200

± 1-5

tmp255-369

±15-20

 

A schematic diagram showing the overall system for deriving a richness map

Figure 11.2 A schematic diagram showing the overall system for deriving a richness map

Although TSF is a continuous variable that can be any value greater than or equal to zero it must in effect be treated as ordinal data. That is, the fire ages within each polygon are essentially ranked, discrete attributes. To illustrate, consider a point on the exact boundary between two adjacent polygons of ages 10 and 205 years. This point must take on the fire age of one of the two polygons (either 10 or 205 years) and cannot take on an intermediate value. If there is any uncertainty in the location of the fire polygon boundaries, there must be uncertainty attached to the ages assigned to points which are close to or on these boundaries. The magnitude of this uncertainty will be expected to increase with an increase in the magnitude of the difference in adjacent polygon fire ages. For instance, the border point between a polygon representing a region with a large TSF that neighbours one with a small TSF will have a very high age uncertainty relative to a border point between two polygons representing areas of similar age.

A question that must still be answered is what constitutes a boundary point? As one follows a straight line transect from a shared polygon boundary to the centre of the polygon, the expected uncertainty will decrease such that the uncertainty estimate will be partly a function of distance. However, this uncertainty-distance relationship is not known so a conservative worst-case approach to uncertainty assessment was adopted. At some point along this transect there will be a very high probability that the TSF is correct. It is here that the spatial uncertainty in the polygon border was assumed to no longer have an effect on the estimated TSF. The distance from the boundary to this point along the transect will define the width of a buffer zone within the polygon as shown in Figure 11.3. The buffer zones represent areas in which the amount of uncertainty is in question. In the cross-section view, this is shown with the rectangle labelled as the zone of unknown uncertainty (Figure 11.3). Since the uncertainty at specific points within this zone is unknown, the uncertainty interval is taken as the lower and upper limits regardless of distance from the boundary.

Uncertainty in boundary regions. Illustrated here is the idea that the location of a boundary line between time-since-fire polygons is fuzzy. The inner polygons represent areas for which the uncertainty can be reasonably estimated. The shaded buffer zones within the perimeter of the polygons represent areas for which uncertainty cannot be accurately defined. There is some unknown probability that the point within the 205-year polygon should actually belong within the 10-year polygon which as a result, manifests itself in a wide range of uncertainty as shown by the upper and lower limits

Figure 11.3 Uncertainty in boundary regions. Illustrated here is the idea that the location of a boundary line between time-since-fire polygons is fuzzy. The inner polygons represent areas for which the uncertainty can be reasonably estimated. The shaded buffer zones within the perimeter of the polygons represent areas for which uncertainty cannot be accurately defined. There is some unknown probability that the point within the 205-year polygon should actually belong within the 10-year polygon which as a result, manifests itself in a wide range of uncertainty as shown by the upper and lower limits

Considering the point near the shared polygon boundary in Figure 11.3, the age could be as old as 225 years or as young as 9 years. This gives rise to asymmetric uncertainty intervals such that a point labelled as representing a forest that was 205 years old could be + 20 or -196 years from this estimate:

tmp255-372

Using this conservative approach we have the following two methods for estimating TSF uncertainty:

1. For areas inside a polygon in which we are reasonably sure that the TSF is correct (areas with attribute uncertainty but not spatial uncertainty), the uncertainty can be represented with a traditional interval based on the attribute uncertainties from Table 11.1. For example, a TSF of 170 years can be represented by 170 ± 5 years or alternatively by a lower and upper limit (165, 175) respectively.

2. For the buffer zones (areas with attribute and spatial uncertainty), the uncertainty can be represented by a lower and upper limit. These limits are defined by the youngest and oldest possible ages of any neighbouring polygons to which a point might belong. This was illustrated by the example above. Notice that these limits include the known attribute uncertainty from Table 11.1. In the event that more than two polygons intersect, only the youngest and oldest TSFs will be of interest in defining the uncertainty bounds.

These two methods are the basis for estimating and mapping the uncertainty in the TSF map.

Canopy type

The canopy type variable was used as a surrogate for the processes which control light transmission to the forest floor. As mentioned earlier, there were four forest classes used in the biodiversity model. These were jack pine, black spruce, trembling aspen and a mixed class (MIX) of white spruce, balsam fir and trembling aspen. In addition to the forest classes, water (WAT), anthropogenic (ANT), and wetland (WET) classes were also defined (each with a training and testing set of data). The anthropogenic class included towns, roads, recently harvested land as well as other artificial features.

The reference data were used to perform two k-nearest neighbour supervised classifications. The first used the seven channels of the Landsat TM imagery and the second used the six SAR image channels. The results for these two classifiers (Table 11.2) show that the two classifiers complemented each other. While the accuracy derived from the SAR data was fairly low, the accuracy for black spruce was higher than observed in the Landsat TM-based classification. One way to combine the information in the SAR and Landsat TM data sets was to combine the 13 channels in a single feature space but the results obtained were poor. An alternative was to combine the class-conditional likelihoods of the individual classifiers. A Dempster-Shafer evidential reasoning algorithm was used and the accuracies, shown in Table 11.3 as class-conditional kappa values as a measure of the uncertainty of the canopy type prediction, were increased.

Next post:

Previous post: