Information Technology Reference
In-Depth Information
w is the concatenation of the vector ( n, 1) of the vector parameter a and of
the scalar parameter b ; Y is a second-order real random variable.
We have
J ( a ,b )=
E [( Y
Xa
b ) X , ( Y
Xa
b )] .
The data samples ( X 1 ,Y 1 ) ,..., ( X k ,Y k ) ,... are available on-line to solve
the estimation problem. They are independent. Then the stochastic gradient
approach may be used. The recursive stochastic gradient estimate is defined
by the following formula:
a k +1 = a k + γ k +1 ( Y k +1
X k +1 a k
b k ) X k +1
.
b k +1 = b k + γ k +1 ( Y k +1
X k +1 a k
b k )
We have the following convergence statement:
If the gain of the algorithm obeys the following conditions k =1 γ k =
,
k =1 γk 2 <
, then the algorithm converges almost certainly to the linear
regression coe cients of Y with respect to X .
The conditions on the gain that have just been stated are general. Here-
inafter, they will be referred to as the stochastic approximation conditions for
the gain. In particular, the sequence γ k =1 /k obeys those conditions.
4.3.3 Recursive Identification of an AR Model
Consider the identification problem of the AR( p )model
X ( k +1)= a 1 X ( k )+
···
+ a p X ( k
p +1)+ V ( k +1) .
We assume that the data are collected under a stationary regime, and we
are looking for a recursive estimate that minimizes the least square criterion
J ( w )= 1
p +1)) 2 ] .
2 E [( X ( k +1)
a 1 X ( k )
−···−
a p X ( k
The gradient of the cost function is: ∇J ( w )= −E{ [ X ( k +1) −a 1 X ( k )
···−a p X ( k−p +1)] · [ X ( k ; ... ; X ( k−p +1)] } Thus, the stochastic gradient
recursive estimate is defined by the algorithm
w ( k +1)= w ( k )+ γ k +1 ϑ ( k +1)[ X ( k ); ... ; X ( k
p +1)] ,
with ϑ ( k +1)= X ( k +1)
p +1).
This rule was encountered previously and has been long known as the delta
rule or Widrow rule. If the gain sequence obeys the stochastic approximation
conditions, the algorithm converges, so that the estimate is consistent.
In the case of AR models, the input-output data are no longer independent.
Therefore, the classical assumptions of the elementary law of large numbers
are not fulfilled. The following Markov linear model produces the data:
a 1 X ( k )
−···−
a p X ( k
Search WWH ::




Custom Search