Information Technology Reference
In-Depth Information
3.1 The Model and the Definition of the Problem
In ICA mixture modelling, it is assumed that feature (observation) vectors x k
corresponding to a given class C k k ¼ 1... ð Þ are the result of applying a linear
transformation A k to a (source) vector s k ; whose elements are independent random
variables, plus a bias vector b k ; i.e.,
x k ¼ A k s k þ b k
k ¼ 1...K
ð 3 : 1 Þ
We assume that A k is a square matrix: feature and source vectors have the same
dimension. This is a practical assumption since original feature vectors are nor-
mally subjected to PCA and only the main (uncorrelated) components are retained
for ICA, with no further reduction in the dimension of the new feature vector that
is obtained in this way. An optimum classification of a given feature vector X of
unknown class is made by selecting the class C k that has the maximum conditional
probability pC k = x
ð
Þ: Considering Bayes theorem, we can write:
Þ¼ p x = C k
ð
Þ PC ðÞ
p ðÞ
p x = C k
ð
Þ pC ðÞ
pC k = x
ð
¼
ð 3 : 2 Þ
P
K
p x = C k 0
ð
Þ pC k ðÞ
k 0 ¼ 1
On the other hand, noting Eq. ( 3.1 ), if x is a feature vector corresponding to class
C k ; then [ 4 ]
p s ðÞ
Þ¼ det A 1
k
p x = C k
ð
3 : 3 Þ
where s k ¼ A 1
ð
x b k
Þ: Considering Eqs. ( 3.1 ) and ( 3.2 ), we can write
k
p s ð pC ðÞ
Þ¼ det A 1
k
pC k = x
ð
ð 3 : 4 Þ
p s k ð pC ðÞ
P
K
det A 1
k 0
k 0 ¼ 1
In conclusion, given a feature vector x ; we should compute the corresponding
source vectors s k k ¼ 1...K ; from s k ¼ A 1
ð
x b k
Þ to finally select the class
p s ðÞ pC ðÞ (Note that the
denominator in Eq. ( 3.4 ) does not depend on k ; so it does not influence the
maximization of pC k =ð Þ ). To make the above computation, we need to estimate
A k ; b k (to compute s k from x) and the multidimensional pdf of the source vectors
for every class (to compute p s ðÞ ). Two assumptions are considered to solve this
problem. First, that the elements of s k are independent random variables (ICA
assumption), so that the multidimensional pdf can be factored into the corre-
sponding marginal pdf's of every vector element. The second assumption is that
there is a set of independent feature vectors (learning vectors) available, which
are represented by matrix X ¼ x ð 1 Þ ...x ð N Þ
k
det A 1
k
having
the
maximum
computed
value
: We consider a hybrid situation where
the classes of a few learning vectors are known (supervised learning), while others
Search WWH ::




Custom Search