Stable Feature Extraction with the Help of Stochastic Information Measure (Pattern Recognition and Machine Learning)

Abstract

This article discusses the problem of extraction of such set of pattern features that is informative and stable with respect of stochastic noise. This is done through the stochastic information measure.

Introduction

In describing of non deterministic system we must take into account the nature of its uncertainty. In pattern recognition uncertainty could have both probabilistic nature, that defined by some additive measure, and more "imprecise" nature, which can be described, for example, by using monotonous nonadditive measures. The feature extraction is the important task of pattern recognition in general and image analyses in particular. This task consists to extraction of some minimal set of features or combinations of them from a set of features with given informativeness, which would have sufficient integral informativeness in solution of given problem of recognition. The are traditional approaches to solving this problem, such as correlation analysis (principal components analysis, etc.), discriminant analysis [2].

In this article we concretized the general problem of informative features extraction as follows. Suppose that pattern X is defined by ordered set X = tmp1411-547_thumbWe call every subsettmp1411-548_thumba representation of pattern X. The problem is to find such representation A that it will be minimal on the one hand and it will be near to X on the other hand. Elements in the X may have different priorities, in other words, they may have different informativeness. Therefore, we will establish correspondence between the elementtmp1411-549_thumband the nonnegative number – a featuretmp1411-550_thumb, that characterize the importance of the element x for the representation of pattern X. We will determine the degree of representativeness of the settmp1411-551_thumbfor pattern X using the set functiontmp1411-552_thumband require from the set function ^ to satisfy all the axioms of monotonous measure: 1)tmp1411-553_thumb


tmp1411-554_thumb(monotonicity).

In addition the measure ^ meets the certain additional conditions related to the specific task of pattern recognition. We will call such measures monotonous information measures. In [1] information measures were used to set and solve the task of finding the most minimal informative polygonal representation of the curve.

In certain tasks of pattern recognition, in particular, in image processing, image analysis and image recognition random nature of image features can be caused by some noisy effects. For example, if the pattern is a discrete plane curve that extracted on the image and features are some characteristics of curve points (eg, feature is a estimation of curvature in given point of discrete curve [3]), then a random character of features (eg, curvature) is caused by the noise of image. In this case the expectation E [M(A)] characterize the level of infor-mativeness of represenationtmp1411-563_thumband the variancetmp1411-564_thumbcharacterize the level of stability of representation to noise pattern. Then there is the problem of finding the most stable and informative representationtmp1411-565_thumbof the pattern X. The complexity of solutions of this problem will be determined by the degree of dependence of random features of each other. Stochastic measure of informativeness M will be additive measure if the features are independent random variables. Otherwise, the measure would be nonadditive measure. In this article mentioned task will be discussed and resolved in the case of probabilistic independence of random features. The case of random dependency of features is investigated in [4].

The Average Set Function of Information of Pattern

In general case the feature may be depend from all or some elements from the representation A on which it calculated i.e.tmp1411-566_thumb. For example, let tmp1411-567_thumbbe a discrete plane close curve:tmp1411-568_thumband tmp1411-569_thumb, wheretmp1411-570_thumbis a point that follows from the point g in ordered representation A ortmp1411-571_thumbis a some estimation of curvature of plane discrete curve that is calculated intmp1411-572_thumbneighbourhood of point g [3].

The degree of completeness of representationtmp1411-573_thumbin describing the pattern X with respect to featuretmp1411-574_thumbmay be assigned with the follow set function

tmp1411-575_thumb

which we call averaged set function of information of pattern X (with respect to feature w). We puttmp1411-576_thumbif the valuetmp1411-577_thumbis not defined for set A (in particulartmp1411-578_thumb

Example 1. Let X = r be a discrete plane close curve:tmp1411-579_thumb

tmp1411-580_thumbAlso we introduce the set functionstmp1411-581_thumb tmp1411-582_thumb, where L(B) and S(B) are perimeter and square correspondingly of figure, that is limited by polygon with vertex in points of ordered set B. Thentmp1411-583_thumbandtmp1411-584_thumbare averaged set functions of information of curvetmp1411-585_thumbwheretmp1411-586_thumbfor set functiontmp1411-587_thumbandtmp1411-588_thumb

tmp1411-589_thumbfor set functiontmp1411-590_thumbis a radius vector of point g G A with respect to arbitrary point O. Note that iiL is a monotonous measure, and is is a monotonous measure too if the domain bounded by polygons with vertexes in the points of discrete curve r is convex.

There is a question when the averaged set functions of information will be monotonous measure? It can be easily shown that follow proposition is true.

Proposition 1. Averaged set functions of information ^ on 2X of form (1) are a monotonous measure iff ^ obeys the following condition

tmp1411-619_thumb

and for any set

tmp1411-620_thumb

and

tmp1411-621_thumb

Corollary 1. Iftmp1411-622_thumbfor alltmp1411-623_thumband .tmp1411-624_thumbthen set function

tmp1411-625_thumbof form (1) is an additive measure on 2X.

Lettmp1411-626_thumbbe a some representation of pattern X. We denotetmp1411-627_thumb

Corollary 2. Lettmp1411-628_thumbandtmp1411-629_thumb tmp1411-630_thumbthen set functiontmp1411-631_thumbof form (1) is a monotonous measure iff

tmp1411-638_thumbtmp1411-639_thumbtmp1411-642_thumb

for alltmp1411-643_thumbandtmp1411-644_thumbIn particular, lettmp1411-645_thumbfor all

tmp1411-646_thumbandtmp1411-647_thumbthen set functiontmp1411-648_thumbof form (1) is a monotonous measure iftmp1411-649_thumbfor alltmp1411-650_thumb

andtmp1411-651_thumbwheretmp1411-652_thumbis a point that follows from the point x in ordered representation A.

The Stochastic Additive Average Information Measure

Lettmp1411-653_thumbbe a averaged information measure ontmp1411-654_thumbof form (1) andtmp1411-655_thumb tmp1411-656_thumbfor alltmp1411-657_thumbandtmp1411-658_thumbIn other words the feature value at point x does not depend from considered representation. The estimation of curvaturetmp1411-659_thumbneighbourhood of point g for discrete plane curve tmp1411-660_thumbis an example of this feature. Then (see corollary 1) the measure

tmp1411-661_thumb, is an additive measure. We suppose that feature characteristics will be independence random variablestmp1411-662_thumb In this case the value of information measuretmp1411-663_thumb for fixed settmp1411-664_thumbwill be random variable too. The set function M(A) will be

an additive measure for any random event. We considered the example of this situation. Suppose that discrete plane curvetmp1411-665_thumbwas subjected to additive stochastic noncorrelated noise. In result we get a random curvetmp1411-666_thumbwheretmp1411-667_thumb

tmp1411-668_thumb

are random noncorrelated variables andtmp1411-669_thumb

a2[€k] = a2y k. Suppose that there is such basic set B C r for which random variables ^(g), g G B, are independent. In this case the feature characteristics w(G) = S7 (g) will be random variables just as the value of measure M(A), A G 2B, is.

Let E [f?(x)] = mx, a2 [ft(x)] = aX. We will investigate numerical characteristics of random additive measure M(A), A G 2X.

The Numerical Characteristics of Stochastic Additive Average Information Measure

Let us find the mathematical expectation of random variable M(A) with a fixed A G 2X. Random variable M(A) is equal to a quotient two random variables € = £xeA n(x) and n = Zxex n(x).

Lemma 1. Let € and n be random variables that taking values in the intervals l^, ln respectively on positive semiaxis and ln C ((1 — J)E [n], (1 + £)E [n]), l£ C (E [€] — [n] , E [€]+ ^E [n]). Then it is valid the following formulas for mean and variance of distribution of ^ respectively

tmp1411-697_thumb

where K [£, n] is a covariation of random variables € and n, could meet the certain additional conditions re i.e. K [€, n] = E [(€ — E [€]) (n — E [n])]/ ri, rx are the residuals that depend on numerical characteristics of £ and rj and \r\\ < -p^r •

tmp1411-698_thumbtmp1411-699_thumbtmp1411-700_thumb

Notice that, a2 [M(X\A)] = a2 [1 — M(A)] = a2 [M(A)] for A G 2X .We will use formulas (4) and (5) without their residuals. Respective values E [M(A)] = E [M(A)] — ri, a2 [M(A)] = a2 [M(A)] — r2 we will call estimations of numerical characteristics. The class of random variables {M(A) : A G 2X} satisfies all the requirements for a finitely additive stochastic measure [5]: 1) E[M2(A)] < to for all A G 2X; 2) M(A) is an almost probably finitely additive measure.

Note that mathematical expectation E [M(A)] define the set function on 2X andtmp1411-701_thumbSince S(A) and D (A) are additive set function then measure E [M(A)] will be additive too.

Finding of Optimal Stable Pattern Representation

We set a problem of finding such representation B of pattern X, which cardinality is less or equal to the given numbertmp1411-702_thumbfor which the summarized variancetmp1411-703_thumbwill be minimal but the sum of squares of mathematical expectations of all representationstmp1411-704_thumbwill be maximal. The value of summarized variance characterizes the stability of representation and all its subset to noise level of pattern and it depends also from number of elements in representation. The more elements in the presentation we have, then the summarized value of variance is greater. We will use mathematical expectations of nonnormalized information measurestmp1411-705_thumb, instead of tmp1411-706_thumb, for simplification of computations. We introduce follow criteria:

tmp1411-713_thumb

Then it is necessary to find such settmp1411-714_thumb, for whichtmp1411-715_thumb

We simplify function f (B). Lettmp1411-716_thumb

Then it is necessary to find such settmp1411-717_thumb, for whichtmp1411-718_thumb

We simplify function f (B). Lettmp1411-719_thumb

Proposition 2. If feature characteristics of pattern X are independent random variables then following equality is true for everytmp1411-723_thumb

tmp1411-724_thumbtmp1411-725_thumb

Corollary 3. If

tmp1411-726_thumbtmp1411-727_thumb 

As

tmp1411-728_thumb then following corollary is true.

Corollary 4. If

We will use the "inclusion-exclusion" procedure for finding of optimal representation which minimize the function f (B). We estimate the variation of function f (B) to the exclusion of element x from B and inclusion of element y G X\B.

Theorem 1. If feature characteristics of pattern X are independent random variables then following asymptotic equality

tmp1411-731_thumb

where is true for every

tmp1411-732_thumb

where

tmp1411-733_thumb

is true for every

tmp1411-734_thumb

Corollary 5.

tmp1411-735_thumb

Formulas from theorem 1 and corollary 5 may be use for construction of algorithmic procedures for finding of representation B, which minimize the function f. Lettmp1411-736_thumbThe algorithm for finding of representation B of cardinality k which minimize the function f (if random variables tmp1411-737_thumbare independent), consist of following steps:

1) we select the set B0 consisting of k elements of pattern X with maximal values of informativenesstmp1411-738_thumbas a initial representation;

2) we compute the valuetmp1411-739_thumbfrom Theorem 1 and we findtmp1411-740_thumb argmintmp1411-741_thumb, if this pair is exist. Then

tmp1411-742_thumbis a new representation for whichtmp1411-743_thumbaccurate

within to small values of second order. This step will be repeating so long as will be pairstmp1411-744_thumb

Iftmp1411-745_thumbthen following conclusion follows from the corollary 5, if we disregard small values: the optimal representation B with cardinality is less or equal k, which minimize the function f will be consist of elements with the greatest valuestmp1411-746_thumb(on the assumption of random  variablestmp1411-747_thumbare independent).

Next post:

Previous post: