Information Technology Reference
In-Depth Information
predicts the value of Y i . We assume that each set of
features X i is partitioned in a set of base features
X B which are common for all learning tasks t i T
and a set of constructed features X i \ X B .
We now introduce a very simple model of
feature relevance and interaction. The feature X ik
is assumed to be irrelevant for a learning task t i if
it does not improve the classification accuracy:
of all base features X B . We then send a query to
all other nodes in the network by range limited
broadcast. Each node compares the weight vec-
tor to the weight vectors representing the locally
stored tasks and corresponding feature sets. This
comparison is based on a function d (t i , t j ). Fi-
nally, we create a set of constructed features as
union of the constructed features associated with
these tasks.
This set is then evaluated on the learning task t i .
If the performance gain is sufficiently high (above
a given threshold) we store task t i as additional
case. Otherwise, the constructed features are only
used as initialization for a feature construction
that is performed locally (Section 4). If this leads
to a sufficiently high increase in performance, the
task t i is also stored to the local case base along
with the locally generated features.
Definition 16 A feature X ik is called irrelevant
for a learning task ti i iff X ik is not correlated to
the target feature Yi, i , that is if Pr ( Y i | X ik ) = Pr ( Y i ) .
The set of all irrelevant features for a learning
task t i is denoted by IF i .
Two features X ik and X il are alternative for
a learning task t i , denoted by X ik ~ X il if they
can be replaced by each other without affecting
the classification accuracy. For linear learning
schemes this leads to the linear correlation of
two features:
learnIng task sImIlarItY for
audIo classIfIcatIon
Definition 17 Two features Xil ik and X il are called
alternative for a learning task ti. i (written as X ik
~ X il ) iff Xi il = a + b X ik with b > 0 .
While feature weighting and feature construction
are well-studied tasks, the core of our algorithm
is the calculation of d using only the relevance
values of the base features X B . In a first step, we
define a set of conditions which must be met by
feature weighting schemes. In a second step, a set
of conditions for learning task distance is defined
which makes use of the weighting conditions.
This is a very limited definition of alternative
features. However, we will show that most weight-
ing algorithms are already ruled out by conditions
based on this simple definition.
We assume that a learning task t i is completely
represented by a feature weight vector w i . The
vector w i is calculated from the base features X B
only. This representation of learning tasks is mo-
tivated by the idea that a given learning scheme
approximate similar constructed features by a set
of base features in a similar way, for example if
the constructed feature “sin( X ik X il )” is highly
relevant, the features X ik and X il are relevant as
well.
Our approach works as follows: for a given
learning task t i we first calculate the relevance
Weighting Conditions 1 Let w be a weight -
ing function w : X B . Then the following
must hold:
(W1) w ( X ik ) = 0 if Xi ik X B is irrelevant
(W2) X i X B is a set of alternative features.
Then
S X i , S ≠Ø: ∑ w ( X k ) = ∑ w ( X k ) = ŵ
X k S
X k X i
Search WWH ::




Custom Search