An Introduction to Remotely Sensed Data Analysis - Sampling Spatial Units for Agricultural Surveys

Agriculture Reference

In-Depth Information

and the city block distance (also called the L1 or Manhattan distance) is defined as

X

M

d CB x 1 ;

ð

x 2

Þ¼

j

x 1 m

x 2 m

j;

ð

4

:

18

Þ

m¼ 1

where M is the number of spectral bands.

It is possible to determine clusters of pixels in the image using a distance

measure. We use the sum of the squared error ( SSE ) as the objective function,

which measures the quality of a clustering. In other words, we calculate the error of

each point (its Euclidean distance to the closest centroid) and then compute the total

sum of the squared errors

X

t x

SSE

¼

ð

x

ʼ i

Þ

ð

ʼ i

Þ;

ð

4

:

19

Þ

C i

x

2

C i

where C i is the i -th cluster and

ʼ i is its mean vector. SSE has a theoretical minimum

of zero, where all clusters contain a single data point.

The objective function in Eq. ( 4.19 ) can be minimized using an iterative

procedure known as K -means or migrating means. The K -means method

(MacQueen 1967 ) is one of the simplest unsupervised learning algorithms. This

procedure classifies a given data set using an a priori fixed number of clusters, C .

The algorithm consists of the following steps:

• Step 1. Place the C points into a multispectral space. These points represent the

initial group centroids.

• Step 2. Assign each pixel to the group that has the closest centroid.

• Step 3. When all pixels have been assigned, recalculate the positions of the

C centroids.

• Step 4. Repeat Steps 2 and 3 until the centroids no longer move. This separates

the objects into groups.

When the centroids are randomly initialized, different runs of the K -means

algorithm typically produce different values for the objective function. Choosing

the proper initial centroids is key to the basic K -means procedure. A technique that

is commonly used to address this problem is to perform multiple runs, each with a

different set of randomly chosen initial centroids. The set of clusters with the

minimum SSE is then chosen as the solution. For other possible initial values for

the centroids, see Everitt et al. ( 2011 ). Obviously, the principal limitation of this

method is the need to pre-specify the number of clusters. This choice also influences

the computational effort of the procedure.

One possible modification of the K -means algorithm is the iterative self-

organizing data analysis technique (ISODATA, Ball and Hall 1965 ). In addition

to K -means, this algorithm merges the clusters if their separation distance in

multispectral space is less than a specified value, and partitions a single cluster

into two clusters if a splitting condition is satisfied.

Sampling Spatial Units for Agricultural Surveys

Search WWH ::

Custom Search

Home