Information Technology Reference
In-Depth Information
predicts the value of
Y
i
. We assume that each set of
features
X
i
is partitioned in a set of
base features
X
B
which are common for all learning tasks
t
i
∈
T
and a set of
constructed features
X
i
\
X
B
.
We now introduce a very simple model of
feature relevance and interaction. The feature
X
ik
is assumed to be irrelevant for a learning task
t
i
if
it does not improve the classification accuracy:
of all base features
X
B
. We then send a query to
all other nodes in the network by range limited
broadcast. Each node compares the weight vec-
tor to the weight vectors representing the locally
stored tasks and corresponding feature sets. This
comparison is based on a function
d (t
i
, t
j
).
Fi-
nally, we create a set of constructed features as
union of the constructed features associated with
these tasks.
This set is then evaluated on the learning task
t
i
.
If the performance gain is sufficiently high (above
a given threshold) we store task
t
i
as additional
case. Otherwise, the constructed features are only
used as initialization for a feature construction
that is performed locally (Section 4). If this leads
to a sufficiently high increase in performance, the
task
t
i
is also stored to the local case base along
with the locally generated features.
Definition 16
A feature X
ik
is called
irrelevant
for a learning task ti
i
iff
X
ik
is not correlated to
the target feature Yi,
i
, that is if Pr
(
Y
i
|
X
ik
) =
Pr
(
Y
i
)
.
The set of all irrelevant features for a learning
task
t
i
is denoted by
IF
i
.
Two features
X
ik
and
X
il
are alternative for
a learning task
t
i
, denoted by
X
ik
~
X
il
if they
can be replaced by each other without affecting
the classification accuracy. For linear learning
schemes this leads to the linear correlation of
two features:
learnIng task sImIlarItY for
audIo classIfIcatIon
Definition 17
Two features Xil
ik
and X
il
are called
alternative
for a
learning task ti.
i
(written as X
ik
~
X
il
) iff Xi
il
=
a
+
b
⋅
X
ik
with b >
0
.
While feature weighting and feature construction
are well-studied tasks, the core of our algorithm
is the calculation of
d
using only the relevance
values of the base features
X
B
. In a first step, we
define a set of conditions which must be met by
feature weighting schemes. In a second step, a set
of conditions for learning task distance is defined
which makes use of the weighting conditions.
This is a very limited definition of alternative
features. However, we will show that most weight-
ing algorithms are already ruled out by conditions
based on this simple definition.
We assume that a learning task
t
i
is completely
represented by a feature weight vector
w
i
. The
vector
w
i
is calculated from the base features
X
B
only. This representation of learning tasks is mo-
tivated by the idea that a given learning scheme
approximate similar constructed features by a set
of base features in a similar way, for example if
the constructed feature “sin(
X
ik
⋅
X
il
)” is highly
relevant, the features
X
ik
and
X
il
are relevant as
well.
Our approach works as follows: for a given
learning task
t
i
we first calculate the relevance
Weighting Conditions 1
Let w be a
weight
-
ing
function
w
:
X
B
→
ℝ
. Then
the following
must hold:
(W1) w
(
X
ik
) = 0
if Xi
ik
∈
X
B
is irrelevant
(W2) X
i
⊆
X
B
is a set of alternative features.
Then
∀
S
⊂
X
i
,
S
≠Ø: ∑
w
(
X
k
) = ∑
w
(
X
k
) = ŵ
X
k
∈
S
X
k
∈
X
i
Search WWH ::
Custom Search