Information Technology Reference
In-Depth Information
A Gaussian process is completely defined by its second-order statistics. The mean
function and the covariance function of a Gaussian process can be defined as follows:
E
(2)
, E
(3)
and ~ ,, . Without loss of generality, it is common to consider
GPs with mean function 0 to simplify the derivation. In this case, a GP is
fully specified given its covariance function.
Since it is infeasible to consider all possible random functions, certain assumptions
must be made when making inference. By restricting the underlying function to be
distributed as a GP, the number of choices is reduced. Furthermore, a Gaussian
predictive distribution can be derived in closed-form under such assumption. If we
consider only zero-mean Gaussian processes, then for a test input , the mean and
variance of the predictive distribution can be computed as follows [16]:
(4)
,
(5)
where , ,…, ,
, is the covariance matrix of training
input vectors with , , and , ,…, is a vector of training
target values. The point prediction is commonly taken to be the mean of the predictive
distribution, i.e. .
4.2
Weight-Sharing Kernel
The covariance function is also called the kernel of a GP. As previously mentioned, the
behavior of a zero-mean GP can be fully specified provided its covariance function. For
regression problems, we want to assign similar prediction values to two input vectors that
are close in space. In other words, if two similar time series are observed, the model
should be able to give similar predictions. A widely used kernel possessing this property
is the radial basis function (RBF) kernel, which is called a squared exponential (SE) ker-
nel in GP literature. It has the form
2
, exp 1
(6)
where is the -th dimension of vector and is the dimension of input vectors.
There are hyperparameters , ,… for this kernel. The hyperpara-
meters are called the characteristic length-scales . It serves as a distance measure
Search WWH ::




Custom Search