Image Processing Reference
InDepth Information
Lam et al.
1998
). These same objectives were also tested using syntactic pattern
recognition systems employing graphbased methodologies (Barnsley et al.
2003
),
objectoriented algorithms (Chapter 10), and automated expert systems. More
recently, spatial metrics have revitalized fractal geometry as part of a suite of indi
ces, including the contagion and patch density measurements (Herold et al.
2002
), as
well as spatial metrics directly related to urban sprawl (Hasse and Lathrop
2003
).
Central to the formulation of these metrics is the concept of photomorphic regions,
or homogeneous urban patches, which are routinely extracted from aerial photo
graphs but are more of a challenge from satellite sensor imagery. The two principal
criticisms of spatial metrics are that their functionality is completely dependent on an
initial spectral characterization of the satellite sensor imagery, and that they are con
spicuously absent from the actual process of the characterization of homogeneous
classes. In this sense they are merely measuring the outcome of the classification
regardless of the accuracy. Nevertheless, spatial metrics have channeled contempo
rary remote sensing research towards group and object based classification, including
modifications of established texture analysis based on the spatial cooccurrence
matrix (Haralick et al.
1973
), geostatistics (Carr
1999
; Pesaresi and Bianchin
2001
)
and wavelet theory (Myint
2003
).
8.3
Modified Maximum Likelihood Classification
There are many hard classifications, some statistically deterministic (minimum
distance, parallelepiped), others, like the popular maximum likelihood (ML), are
based on stochastic mechanisms. The objective is to assign the most likely class
w
j
,
from a set of
N
classes,
w
1
, …,
w
N
, to any feature vector
x
in the image. A feature
vector
x
is the vector (
x
1
,
x
2
, …,
x
M
), composed of pixel values in
M
features (in
most cases, spectral bands). The most likely class
w
j
for a given feature vector
x
is the one with the highest posterior probability Pr(
w
j

x
). Therefore, all Pr(
w
j

x
),
j
∈ [1, …,
N
] are calculated, and
w
j
with the highest value is selected. The calcula
tion of Pr(
w
j

x
) is based on Bayes' theorem,
( )
( )
( )
=
Pr
x
w
×
Pr
w
( )
j
j
(8.1)
Pr
w
x
j
Pr
x
On the left hand side is the posterior probability that a pixel with feature vector
x
should be classified as belonging to class
w
j
. The right hand side is based on Bayes'
theorem, where Pr(
x

w
j
) is the conditional probability that some feature vector
x
occurs in a given class: in other words, the probability density of
w
j
as a function of
x
.
Supervised classifications, such as the ML, derive this information from training
samples. Often, this is done parametrically by assuming normal class probability
densities and estimating the mean vector and covariance matrix. Alternatively, it is
possible to use Markov random fields (Berthod et al.
1996
), or nonparametric