Information Technology Reference
In-Depth Information
Table 1.1
Some kernel definitions
Kernel
K
ð
t
Þ
Gaussian
2
p
e
ð
1
=
2
Þ
t
2
1
Rectangular
1
2
for
j
t
j
\1, 0 otherwise
Triangular
1
j
t
j
for
j
t
j
\1, 0 otherwise
=
5
p
p
5
Epanechnikov
3
4
1
1
5
t
2
for
j
t
j
\
;
0 otherwise
Biweight
15
16
ð
1
t
2
Þ
2
for
j
t
j
\1
;
0 otherwise
1.1.2 Semi-Supervised Learning
Traditionally, there are two different types of tasks in machine learning: supervised
and unsupervised learning. For supervised learning, there is a sample x
fg
of
patterns that are independently and identically distributed (i.i.d.) from some
unknown data distribution with density P
ð
x
Þ
that has to be estimated. Supervised
learning consists of estimating a functional relationship x
!
y between a covariate
x and a class variable y
2
1
;
...
; f g;
with the goal of minimizing a functional of
the joint data distribution P
ð
x
;
y
Þ
such as the probability of classification error. The
marginal data distribution P
ð
x
Þ
is referred to as input distribution. Classification
can be treated as a special case of estimating the joint density P
ð
x
;
y
Þ:
Unsupervised learning can be considered as a density estimation technique.
Many techniques for density estimation usually propose a latent (unobserved) class
variable y and estimate P
ð
x
Þ
as mixture distribution
P
i
¼
1
P
ð
x
j
y
Þ
P
ð
y
Þ:
Note that
the role of y in unsupervised learning is for modelling instead of being a role that is
related to observable reality, which is the usual role of y in classification.
The semi-supervised learning (SSL) problem could be considered as belonging to
any of the two previous categories of learning. If the goal is to minimize the clas-
sification error, semi-supervised learning would be a supervised task; however if the
goal is to estimate P
ð
x
Þ
, it would be an unsupervised task. In this latter case, the
problem imposes more significance on the density estimation, and the labelled data
are treated as an auxiliary resource. Thus, ''semi-unsupervised learning'' would be a
more suitable name for this task. The difference with a standard classification setting
is that along with a labelled sample D
l
¼fð
x
i
;
y
i
Þj
i
¼
1
;
...
;
n
g
that is drawn i.i.d.
from P
ð
x
;
y
Þ
there is also has access to an additional unlabelled sample D
u
¼
fð
x
n
þ
j
j
j
¼
1
;
...
;
m
g
from the marginal P
ð
x
Þ
. Of special interest are the cases where
m
n
;
which may arise in situations where obtaining an unlabelled sample is cheap
and easy, while labelling the sample is expensive or difficult. We denote X
l
¼
ð
x
1
;
...
;
x
n
Þ;
Y
l
¼ð
y
1
;
...
;
y
n
Þ
and X
u
¼ð
x
n
þ
1
;
...
;
x
n
þ
m
Þ:
The unobserved labels are
denoted Y
u
¼ð
y
n
þ
1
;
...
;
y
n
þ
m
Þ
[
6
].
Search WWH ::
Custom Search