Machine Learning and Artificial Intelligence-Based Approaches - Hydrological Data Driven Modelling

Geology Reference

In-Depth Information

es the learning process by changing the

representation of the data in the input space to a linear representation in a higher-

dimensional space called a feature space. A suitable choice of kernel allows the data

to become separable in the feature space despite being non-separable in the original

input space. Four standard kernels are usually used in classi

The role of the kernel function simpli

cation problems and

also in regression cases: linear, polynomial, radial basis, and sigmoid:

x 0

x T

linear

;

x 0

x T

polynomial

;

x 0

;

ð 4 : 62 Þ

x 0

Radial

exp c

x 0

sigmoidal

tanh

ðc

Currently, several types of support vector machine software are available. The

software used in this project was LIBSVM developed by Chih-Chung Chang and

Chih-Jen, and supported by the National Science Council of Taiwan. The source

code is written in C++. The choice of this software for our case studies in sub-

sequent chapters was made on its ease of use and dependability. The LIBSVM

model is capable of C-SVM classifi-

cation, one-class classifi-

cation,

-SV classifi-

cation,

first trains the support

vector machine with a list of input vectors describing the training data.

Normalization of input vectors is important in SVM modeling. In SVM mod-

eling, we have performed analysis with

-SV regression, and

-SV regression. The software

-SVR using different kernel

functions such as linear, polynomial, radial, and sigmoid for three case studies in

this topic. These case studies chose different kernel and SVR models based on trial

and error experiments. The performance of

-SVR and

-SVR with linear kernel was better than

that of

-SVR with linear kernel in all case studies explained in Chaps. 5 , 6 and 7 .

This topic recommends further exploration of the modeling of the selected two SVR

models, as the reason for over performance of one SVR over other in our case

studies is not clear. The SVM hypothesis suggests that the performance of SVM

depends on the slack parameter (

) and the cost factor (C). We have performed the

analysis, varying the

= 0.00001, and the cost parameters

C = 0.1 to C = 1000 in different case studies. The cost factor of error (C) assigns a

penalty for the number of vectors falling between the two hyperplanes in the

hypothesis. It suggests that, if the data is of good quality, the distance between the

two hyperplanes is narrowed down. If the data is noisy it is preferable to have a

smaller value of C which will not penalize the vectors [ 14 ]. So it is important to

values between

=1to

the optimum cost value for SVM modeling. To ascertain the optimum cost value,

the support vector machine with linear kernel has made different iterations with

different values of C for three case studies in this topic.

Hydrological Data Driven Modelling

Search WWH ::

Custom Search

Home