Database Reference
In-Depth Information
{
·
x j
+ b : y j = i
}
for i =
{
1 ,
}
standard deviations of the function
w
1
;the
term λ
2 has the aim to regularize the norm of the weight vector. Also in
this case, it is possible to rewrite the weight's vector w as a linear combination
of the training examples and the function g ( x ) in dual coordinates. For an
explicit derivation see ( 27 ).
Only if there exists an hyperplane that correctly classifies the data, the
Perceptron procedure is guaranteed to converge; furthermore, the algorithm
may give different results depending on the order in which the elements are
processed, indeed several different solutions exist. Fisher's Discriminant does
not suffer from these problems because its solution is unique since it finds the
hyperplane ( w ,b ) on which the projection of the data is maximally separated.
Fisher's Linear Discriminant (FDA), Partial Least Squares (PLS), Ridge
Regression (RR), Principal Components Analysis (PCA), K-means and Spec-
tral Clustering (SC), Canonical Correlation Analysis (CCA), Novelty Detec-
tion (ND), and many others can all be implemented in a dual form following
the approaches outlined here. We refer the reader to (25; 18; 29; 6; 19; 1; 27)
for more information on these methods, to (3) for a tutorial on kernel methods
based on eigenvalue problems (PCA, CCA, PLS, FDA and SC), and to (33; 32)
for two nice examples of the use of kernel methods in real life problems.
Owing to the level of maturity already achieved in these algorithmic do-
mains, recently the focus of kernel methods research is shifting towards the
design of kernels defined on general data types (such as strings, text, nodes
of a graph, trees, graphs,. . . ). Major issues in kernel design are expressive
power and eciency of evaluation (10; 13; 30; 17; 12).
w
1.2.2 Formal Properties of Kernel Functions
So far, the only way of verifying that the considered function is a kernel is to
construct a feature space, for which the function corresponds to first perform-
ing the feature mapping and then computing the inner product between the
two images. An alternative method of demonstrating that a candidate func-
tion is a kernel is Mercer's Theorem; it provides a characterization of when
a function κ ( x , z )isakernel. Thisisanimportanttheoreticaltoolusefulto
create new kernels, and combine different kernels to form new ones.
The kernel matrix K ij = κ ( x i , x j ), formed by evaluating a kernel on all
pairs of any set of inputs, is a positive semi-definite matrix.
Finitely Positive Semi-Definite Functions
A function
κ : X
×
X
−→ R
satisfies the finitely positive semi-definite property if it is a symmetric function
for which the matrices formed by restriction to any finite subset of the space
X are positive semi-definite. Note that this definition does not require the set
X to be a vector space.
Search WWH ::




Custom Search