Uncertainty in Land Cover Mapping from Remotely Sensed Data using Textural Algorithm and Artificial Neural Networks Part 2

Spatial distribution of the studied land cover classes

The complexity of the boundaries in the study area was analysed from the land cover representation produced from the photointerpretation (ground data). Data layers with buffers (corridors) along boundaries were constructed using 3 x 3, 5 x 5 and 7 x 7 windows. Nearly half of the pixels were located in the nearest proximity of borders (3 x 3 window) and as much as 77% of the data lay within a 7 x 7 boundary buffer (Figure 7.5). Table 7.1 presents the distribution of pixels in the proximity of boundaries for each class: deciduous forests form the smallest and most elongated patches, while patches of arable lands were generally the largest and most compact. The urban class was also characterized by a high degree of patchiness.

Spatial 'signatures' or responses. Variograms derived from red band (TM3)

Figure 7.4 Spatial ‘signatures’ or responses. Variograms derived from red band (TM3)

Methods

Computation of variogram-derived texture measures

To explore the capabilities of variogram-derived measures both univariate and multivariate estimators of the variogram were used (Cressie, 1993; Journel and Huijbregts, 1978; Wackernagel, 1995). Experimental univariate variograms were calculated for each pixel in each of the six non-thermal Landsat TM wavebands and the ERS-1 SAR image. Multivariate variograms (cross variograms and pseudo-cross variograms) were calculated between pairs ofthe first threeprincipal components derived from the six non-thermal Landsat TM wavebands. A principal components analysis (PCA) was undertaken to reduce the number of possible waveband combinations (from 15 to 3), while retaining as much as 98.7% of the original imagevariance. Abounded variogrammodelwas assumed: ifarange wasnot achieved during the calculation, it was set at the maximum allowable lag (lag = 34, corresponding to approximately 1 km on the ground). The maximum value of the range was, however, rarely reached, with a mean valuebelow200 m (5-7 pixels). Ranges calculated from data acquired in the infrared wavebands were slightly larger than those derived fromthosein thevisiblewavebands. Theshortest ranges werenoted for theERS-1 SAR image (below 100 m). This suggested that the textural information content of the SAR imagehad not been lost during the pre-processing stage.


A large set of indices (a total of 89) was calculated for each central pixel within a moving window of a changeable size (Jakomulska and Clarke, 2001; Jakomulska and Stawiecka, 2002) from both univariate and multivariate variograms. These included sills and ranges, mean and sum of semivariances calculated up to a range, slope of the variogram, ratios of semivariances for consecutive pairs of lags, sum of absolute differences between variogram model (spherical, exponential and Gaussian) and experimental variogram model, ratios of sills for band pairs, and measures of variogram shape.

Table 7.1 Percentage of pixels within proximity of a border

Land cover class

3 x 3 window

5 x 5 window

7 x 7 window

Built-up areas

66

85

91

Coniferous forests

44

68

80

Deciduous forests

72

93

99

Meadows

59

83

91

Arable lands

30

50

65

Average

43.4

64.8

76.8

Buffers along borders (in white) in (a) a 3 x 3 and (b) 7 x 7 window

Figure 7.5 Buffers along borders (in white) in (a) a 3 x 3 and (b) 7 x 7 window

Due to the large number of variables, a subset of the most suitable for textural image classification was chosen for further analysis. Screening (through an assessment of the discriminating power of the variables) was achieved using decision binary trees (Jakomulska and Stawiecka, 2002), because their structure is explicit and, therefore, easily interpretable (Safavian and Landgrebe, 1991; Friedl and Brodley, 1997). Decision binary trees use a hierarchical framework to partition sequentially observations, where the criteria for partitioning the data space at each node can vary for each class. This is an important advantage in textural classification, since some classes may be classified using solely spectral information, some using only textural information, while other classes require both spectral and textural information in order to be classified accurately. Univariate decision trees were constructed using both spectral and textural wavebands. An example of a simplified, non-overlapping decision binary tree is shown in Figure 7.6.

Coniferous forests were distinguished from all the other classes by a large sill in the pseudo-cross variogram calculated between the first and the third principal component (Figure 7.6). A small sill in the pseudo-cross variogram calculated between the second and the third principal component distinguished vegetated areas (deciduous forest and meadows) from barren regions (arable lands and built-up areas). Arable lands were characterized by smaller mean semivariances (calculated up to a range) in the blue waveband relative to that observed for built-up areas. Finally, meadows were distinguished from deciduous forests by a spectral characteristic, namely a large reflectance in the green waveband (TM band 2).

Textural bands that did not participate in the construction of the decision trees were dropped from further analyses. All of the original spectral wavebands (TM1, TM2, TM3, TM4, TM5, TM7 and ERS SAR) and the 34 selected textural variables were, therefore, available for inclusion in further analyses (Table 7.2). Range, sill,mean semivariances and sum of semivariances (up to a range) from univariate variograms (calculated on Landsat TM bands) and from cross variograms (calculated on the first three principal components (PCs) derived from the Landsat TM data) were used.

 A simplified non-overlapping decision binary tree constructed on textural and spectral bands. Variables used in decision rules: PCA13.6 - sill of a pseudo-cross variogram calculated between the first and the third PC, PCA23.6 - sill of a pseudo-cross variogram calculated between the second and the third PC, TM2 - original green band of TM, V1.3 - mean of semivariances calculated up to a range on blue band (TM1)

Figure 7.6 A simplified non-overlapping decision binary tree constructed on textural and spectral bands. Variables used in decision rules: PCA13.6 – sill of a pseudo-cross variogram calculated between the first and the third PC, PCA23.6 – sill of a pseudo-cross variogram calculated between the second and the third PC, TM2 – original green band of TM, V1.3 – mean of semivariances calculated up to a range on blue band (TM1)

Table 7.2 Textural variables derived from the Landsat TM and ERS-1 SAR images

Variables

Cross variogram

Pseudo-cross variogram

derived from:

Variogram

Variogram

Calculated on:

6 Landsat

ERS-1 SAR

TM bands

image

3 PCs derived from

Landsat TM bands

Sill

6

3

3

Semivariance at lag = 0

3

Sum of semivariances

3

Mean semivariances

6

1

3

Number of variables

12

1

9

6

Total number of variables without ranges

28

Range

6

Total number of variables with ranges

34

For pseudo-cross variograms (on PCs) sill (semivariance at a range) and semivariance at lag = 0 were retained. Only one textural waveband (mean semivariance) calculated from the ERS-1 SAR imagery proved to have useful discriminating power and was used in the classification. With the exception of pseudo-cross variances at lag = 0, all texture indices were derived at a range or involve computation based on consecutive variances up to a range. Mean semivariances and cross-variances calculated up to a range incorporate both sill and range information and approximate shape of the variogram. In the further analyses reported below three data sets were used:

1. spectral data only (7 variables: 6 Landsat TM wavebands and 1 ERS-1 SAR data set);

2. spectral data combined with textural measures derived from within the moving window: sills, mean semivariances and pseudo-cross variances at lag = 0 (7 spectral plus 28 textural variables, giving a total of 35 variables);

3. spectral data combined with all textural wavebands, including six ranges calculated for each Landsat TM spectral band, corresponding to the kernel sizes (7 spectral plus 34 textural variables, giving a total of 41 variables).

Artificial neural networks

A standard three-layered feed-forward artificial neural network (ANN) using a Levenberg-Marquardt second-order training method was employed for supervised classification (Haykin, 1994; Hagan and Manhaj, 1995; Svozil et al., 1997). Since training data size and composition is of major importance to a classifier’s ability to discriminate classes,a training set in which each class had equal representation was formed. This contained 200 pixels for each class, and was derived from training areas identified visually through on-screen digitizing. Pixels for training were selected both from within the core of patches as well as in the vicinity of patch boundaries. This process was based on expert decision, and no particular rule was adopted (since that would require precise definition for ‘core’ and boundary areas).

In the first experiment, only the spectral wavebands of the Landsat TM, with no textural information of any kind, were used. Several network architectures, represented by the number of neurons (or units) in the input (I) hidden (H) and output (O) layer (I: H: O) weretried, by varying the size of the hidden layer. Here, thenumber of hidden neurons was systematically varied from 4 to 40. However, in no case did convergence occur when utilizing solely the Landsat TM spectral data. This was in a striking contrast to the second and the third set of experiments outlined below, where summed square error per pattern during ANN training could have been reduced to a very small value (e.g. 5 x 10-2 or less).

In the second set of experiments, 35 selected spectral and textural wavebands (data set without range) sufficed to build small networks with what was perceived to be very good convergence properties. Very rapid convergence was observed for every training session with only four neurons in the hidden layer (a network architecture of 35:4:5).

In the third experiment, in addition to the 35 variables used in the second experiment, the data corresponding to the sliding window size (variogram range) were also used. The same number of hidden neurons as before, that is the network of the 41:4:5 architecture, sufficed for a similar, very rapid, convergence during every training session.

Accuracy assessment

Classification accuracy was evaluated with the aid of a confusion matrix that shows a cross-tabulation of the class labels in the output of a classification against those in the ground data. To assess classification accuracy, a testing set is typically chosen using some sampling design. In this study, the whole data set (excluding the training data) was used in assessing classification accuracy. This approach was chosen for two reasons. Firstly, this approach avoided complexities arising through the influence of the sampling design used to acquire the reference data and ensured that the composition of testing samples reflected the true class proportions (Congalton, 1991; Richards, 1996). Secondly, the approach aided the study of the spatial distribution of misclassified pixels. Due to the high density of boundaries in the studied data, the ‘border effect’ of textural classification was examined through the analysis of the spatial distribution of misclassified pixels with respect to buffers created along class boundaries (derived from the ground data).

Following Gopal and Woodcock (1994), two criteria, based on the membership (the network activation level) and difference in class memberships, were used to discriminate between pixels classified with (i) a high degree of confidence, (ii) a degree of uncertainty and (iii) those for which there was so little confidence in any allocation that they were left unclassified. The magnitude of the output neuron activation level was taken as a measure of the strength of membership to the class associated with that neuron. A threshold of 0.9 in the activation level of output neurons was applied to discriminate between pixels classified with a high level of confidence to the allocated class and those with a low maximum activation level. Pixels with the highest neuron activation above 0.9 were further divided into two subgroups. The first contained confidently classified pixels, where the difference between the maximum and the second largest output neuron activation levels was larger than 0.25. The second comprised pixels classified with some degree of uncertainty, with the difference between the highest and second highest output neuron activation levels < 0.25.

Results

Since training on spectral wavebands only did not bring convergence, further analysis was limited to comparison of the second and third experiment of neural network classification. Both experiments were based on data sets containing a combination of spectral and textural measures, the former without and the latter with the range from the variogram used as a variable in the analysis. The results are summarized in Table 7.3.

Analysis of pixels classified with a high degree of confidence

Classification using 35 variables (without ranges) resulted in greater confidence in the classification (3%morepixels with highest neuron activation above0.9threshold)than classification using 41 variables (with ranges). However, the overall classification accuracy increased from 64.3%, to 71.0% (kappa coefficient of agreements of 0.48 and 0.53 respectively) after the inclusion of ranges. The addition of data on the variogram range to the textural set of bands increased either the user’s or producer’s accuracy of most of the classes (where user’s accuracy is a probability that a pixel classified on the image actually represents that class on the ground and is a measure of the commission error while the producer’s accuracy indicates the probability of a reference pixel being correctly classified and is a measure of the omission error (Congalton, 1991)). On average, the accuracy of each class except arable lands increased by several per cent (Table 7.4). The largest difference was noted for meadows (increase by 13%) and built-up areas (increase by 7%), the kappa coefficient of agreement increased by 0.17 and 0.08, respectively.

Table 7.3 Neural network classification results: percentage of confident, uncertain and undecided (unclassified) data

Threshold

Percentage of classified data

Without ranges

With ranges

Above 0.9

Single high activation (confident) Multiple high activations (uncertain)

80 4

77 12

Below 0.9

Undecided

16

11

Table 7.4 Confusion matrices: training on 35 (without ranges; light columns) and 41 variables (including ranges; shaded columns). Columns indicate label in the ground data while rows show the label in the classified image

Built-up

Coniferous

Decid

uous

Meadows

Arable

lands

Total

User’s accuracy (%)

Built-up

2011

1881

170

866

1339

613

2451

816

6280

3796

12251

7972

16.4

23.6

Coniferous

51

2

3714

2614

343

1 71

59

49

74

28

4241

2864

87.6

91.3

Deciduous

162

238

1926

1392

2694

2670

1144

902

2485

1931

8411

7133

32.0

37.4

Meadows

399

128

19

9

1104

522

9800

8412

2054

707

13376

9778

73.3

86.0

Arable lands

649

1068

101

363

306

894

3322

4671

25875

31271

30253

38267

85.5

81.7

Total

Producer’s accuracy (%)

3272 61.5

3317 56.7

5930 62.6

5244 49.8

5786 46.6

4870 54.8

16776 58.4

14850 56.6

36768 70.4

37733 82.9

68532

66014

100.0

64.3

71.0

In the data set without ranges, the major confusion between classes occurred for the following pairs of classes: built-up areas and deciduous forests, built-up areas and meadows, built-up areas and arable lands, coniferous and deciduous forests, deciduous forests and meadows, deciduous forests and arable lands and meadows and arable lands.

Most of the confusion present arose from the similarity of both spectral and textural response (e.g. between two types of forests or between built-up areas and arable lands). Confusion between meadows and arable lands appeared to arise from the generalization of the ground data (due to dispersed character of cultivated parcels, many small meadows were merged with larger arable lands classes in the generalization process). However, confusion between classes characterized by a ‘good’ separability of both spectral and textural responses (e.g. built-up areas and deciduous forests; built-up and meadows) was a direct result of the classification algorithm used. It is believed that the problem lies in the buffering side effect of the textural analysis.

The smallest patches in the study area were generally built-up areas, meadows and deciduous forests. The confusion between the three classes decreased after the addition of the six variogram ranges. Figure 7.7 (see plate 2) shows a large buffering effect in the results of classification without ranges, while the addition of variogram range reduced this effect. The classification without ranges over-estimated built-up areas and deciduous forests, which appeared erroneously at many borders. The major decrease was noted in the commission error of meadows and deciduous forests; and in the omission error of arable lands and deciduous forests.

To investigate border effects the spatial distribution of incorrectly allocated pixels was analysed. Proximity to boundaries was assessed through an analysis of the distribution of pixels with respect to buffers along class borders (Figure 7.8). It was noted, that in cases where commission error of a class decreased after ranges were added (built-up, deciduous and meadows), the number of pixels within the proximity of boundaries had decreased. On the other hand, with arable lands (for which there was a decrease of the omission error), the number of pixels within the buffer increased (Figure 7.8). Hence, the differences were not randomly distributed in space, but concentrated at the boundaries.

Analysis of pixels classified with uncertainty

The addition of variogram ranges increased the uncertainty of the classification,interms of the percentage of pixels with more than one output neuron providing a high activation level (4% and 12% for data set without and with ranges, respectively). Pixels with multiple high neuron activations (highest activation > 0.9 and difference between the highest and second highest activation less than 0.25) were grouped in the ‘uncertain’ category. Accuracy quantified using only the highest neuron activation was low, 51% and 61% (without and with ranges, respectively). However, if either the highest or the second highest response was accepted, the accuracy increased to 85% and 94%.

Amount of pixels within buffers along boundaries. Results for the textural classification without and with ranges are shown.

Figure 7.8 Amount of pixels within buffers along boundaries. Results for the textural classification without and with ranges are shown.

The spatial distribution of uncertain pixels for the data set without ranges was only slightly correlated with boundaries, and it is hypothesized that this subset comprised mixed pixels, being a result of the relatively large Landsat TM pixel size in relation to the size of the small land cover patches. For the data set with ranges, spatial correlation with borders was important, particularly for deciduous forests and meadows.

In spite of the present uncertainty, even within this dubiously classified group of pixels, it was possible to classify the data correctly. Assuming the condition that either of the two highest responses was correct, in combination with confidently classified pixels, the overall classification accuracy increased to 65.3% and 74.2% (without and with variogram ranges, respectively).

Analysis of unclassified pixels

Greater confidence in the classification was observed with than without variogram ranges, in terms of the total number of classified data (11% and 16% of data were left unclassified, respectively). Furthermore, cases were distributed fairly regularly over the whole data set (although not totally at random), while in the classification without ranges, the unclassified pixels were concentrated along boundaries. Again, this result indicated that inclusion of ranges decreased the uncertainty and error for pixels located along class boundaries.

Finally, a small group of pixels was observed with very low output neuron activation levels (the activation level of all output neurons was below 0.6). Most of these pixels belonged to the arable land class. Visual inspection of a sample of these pixels revealed that they were of a relatively distinct sub-class, wet arable lands, characterized by low reflectance in all spectral wavebands. Hence, in spite of the same land use and land cover, the wet arable lands differed with respect to spectral response of the more common agricultural lands. Note also that due to the small extent of the wet arable lands, sites of this class had not been included in training the classification.

Summary and Conclusions

The potential of textural classification has been demonstrated. An insight into the heart of the classification procedure demonstrated that the majority of the discriminating variables were textural bands extracted from the variogram. It has been further shown that the addition of the variables describing window size increased the accuracy of neural network classification both qualitatively and quantitatively: both the accuracy and confidence of the classification (in terms of the number of correctly classified pixels) increased by several per cent. It could be argued that the addition of any other information might increase the accuracy, since the dimensionality of the data used increased (unless the Hughes’ phenomenon occurred). However, it has been shown that major differences between the two classifications were spatially correlated and occur along the boundaries of classes in the study area. It appears that the use of variogram ranges partially corrects for the problem of mixed pixels distributed along boundaries.

Despite the above, the overall classification accuracy remains low, with a large percentage of unclassified (11%) and uncertain (12%) data, and many misclassified pixels (28.9%). It is realized that further refinement of both textural quantification and classification techniques should be pursued. Foremost, the neural network analyses indicated that the within-class variance of the training sample was not fully representative, resulting in a low accuracy of the classification of the testing data. Derivation of very pure training samples reduces the generalization ability of the classifier, and in spite of the high training accuracy, reduces the accuracy of the prediction for unseen data.However, repeating analyses reported in this study but using randomly selected training data did not result in network convergence. Therefore, more attention should be paid to the choice of the training data set, since in the case of textural classification, both spectral and textural representativeness is important. Furthermore, considerable noise and some unavoidable discrepancies between satellite and aerial sensor images (ground data) introduced factors reducing the ability to assess precisely both the feasibility and accuracy of the methods tested.

Many studies have shown that textural information derived from within a window of an adapting size may be more useful than that derived from a kernel of a fixed size. However, as the window size varies the number of pixels and the number of lags used in the calculation of the variogram will change and this may influence the estimation of both the sill and range of the variogram (for small windows none of these may be achieved). This effect is partially accounted for when the size of the window is itself used as a variable in the classification. However, further improvement of the adapting window algorithm would be helpful. Ideally, pixels at the borders (for which a small window size is used) should be classified solely on the spectral or contextual information or directional variograms could be used to adapt the window size differently in different directions.Although these techniques are computationally challenging, they show that there is potential to further extend the capabilities of textural analysis.

Next post:

Previous post: