Geoscience Reference
In-Depth Information
If an input test set ( -i ) is provided, it is used for the accuracy assessment. If not, the
accuracy assessment is based on a cross validation ( -cv ) of the training sample.
The optimization routine uses a grid search. The initial and final values of
the parameters can be set with -cc startvalue -cc endvalue and -g
startvalue -g endvalue for cost and gamma respectively. The search uses
a multiplicative step for iterating the parameters (set with the options -stepcc and
-stepg ). An often used approach is to define a relatively large multiplicative step
first (e.g. 10) to obtain an initial estimate for both parameters. The estimate can then
be optimized by defining a smaller step (
>
1) with constrained start and end values
for the parameters cost and gamma.
The following code performs a grid search based on a two-fold cross validation
of the training sample The options -cc 0.01 -cc 100000 set the start and end
value for the cost parameter in the grid search. Similarly, the options -g 0.001
-g 1000 set the start and end value for the parameter gamma. Both parameters are
increased with a multiplicative step of 10 after each iteration. We set the verbose
level to 1 to print the results of the objective function in function of input parameters
cost and gamma for each iteration.
pkoptsvm -t training_landsat.sqlite -cv 2 -cc 0.01 -cc 100000 -g
0.001 -g 1000 -step 10 --ccost 10 --gamma 10 -v 1
12.5.5 Feature Selection
Classification problems dealing with high dimensional input data can be challenging
due to the Hughes phenomenon (Hughes 1968). Hyperspectral data, for instance, can
have hundreds of spectral bands and require special attention when being classified.
In particular when limited training data are available, the classification of such data
can be problematic without reducing the dimension.
The SVM classifier has been shown to be more robust to this type of problem than
others (Melgani and Bruzzone 2004; Plaza et al. 2009). Nevertheless, classification
accuracy can often be improvedwith feature selectionmethods. The utility pkfssvm
implements a number of feature selection techniques, among which a sequential
floating forward search (SFFS) (Pudil et al. 1994).
We show how to select 16 features from a training sample that is based on a high
dimensional dataset. The utilities pksvm , pkoptsvm and pkfssvm share many
command line options. Also for the utility pkfssvm , the training vector dataset must
contain both the labels and band information for each sample unit. In this example,
we use a five-fold cross validation. 9
9 Set the option -cv 2 to speed up the process.
 
 
Search WWH ::




Custom Search