Pktools - Open Source Geospatial Tools: Applications in Earth Observation

Geoscience Reference

In-Depth Information

If an input test set ( -i ) is provided, it is used for the accuracy assessment. If not, the

accuracy assessment is based on a cross validation ( -cv ) of the training sample.

The optimization routine uses a grid search. The initial and final values of

the parameters can be set with -cc startvalue -cc endvalue and -g

startvalue -g endvalue for cost and gamma respectively. The search uses

a multiplicative step for iterating the parameters (set with the options -stepcc and

-stepg ). An often used approach is to define a relatively large multiplicative step

first (e.g. 10) to obtain an initial estimate for both parameters. The estimate can then

be optimized by defining a smaller step (

>

1) with constrained start and end values

for the parameters cost and gamma.

The following code performs a grid search based on a two-fold cross validation

of the training sample The options -cc 0.01 -cc 100000 set the start and end

value for the cost parameter in the grid search. Similarly, the options -g 0.001

-g 1000 set the start and end value for the parameter gamma. Both parameters are

increased with a multiplicative step of 10 after each iteration. We set the verbose

level to 1 to print the results of the objective function in function of input parameters

cost and gamma for each iteration.

pkoptsvm -t training_landsat.sqlite -cv 2 -cc 0.01 -cc 100000 -g

→

0.001 -g 1000 -step 10 --ccost 10 --gamma 10 -v 1

12.5.5 Feature Selection

Classification problems dealing with high dimensional input data can be challenging

due to the Hughes phenomenon (Hughes 1968). Hyperspectral data, for instance, can

have hundreds of spectral bands and require special attention when being classified.

In particular when limited training data are available, the classification of such data

can be problematic without reducing the dimension.

The SVM classifier has been shown to be more robust to this type of problem than

others (Melgani and Bruzzone 2004; Plaza et al. 2009). Nevertheless, classification

accuracy can often be improvedwith feature selectionmethods. The utility pkfssvm

implements a number of feature selection techniques, among which a sequential

floating forward search (SFFS) (Pudil et al. 1994).

We show how to select 16 features from a training sample that is based on a high

dimensional dataset. The utilities pksvm , pkoptsvm and pkfssvm share many

command line options. Also for the utility pkfssvm , the training vector dataset must

contain both the labels and band information for each sample unit. In this example,

we use a five-fold cross validation. 9

9 Set the option -cv 2 to speed up the process.

Search WWH ::

Custom Search

Home