Speech Enhancement - Digital Speech: Coding for Low Bit Rate Communication Systems

Digital Signal Processing Reference

In-Depth Information

11.2 Review of STSA-based Speech Enhancement

Assuming that the noise d(n) is additive to the speech signal x(n) ,thenoisy

speech y(n) can be written as,

y(n) =

x(n) +

d(n),

for 0

≤

−

(11.1)

where n is the time index. The objective of speech enhancement is to find

the enhanced speech

x(n) given y(n) , with the assumption that d(n) is

uncorrelated with x(n) . The time-domain signals can be transformed to the

frequency domain as,

Y k =

X k +

D k ,

for 0

≤

−

(11.2)

where Y k , X k ,and D k denote the short-time DFT of y(n) , x(n) ,and d(n) ,

respectively. The STSA-based speech enhancement filters out the noise by

modifying the spectral amplitudes of Y k in equation (11.2). Therefore, the

enhanced spectrum X k can be written in terms of the modification factor

(gain) G k and the noisy spectrum Y k as,

X k =

G k Y k ,

for 0

≤

G k ≤

(11.3)

The gain G k is a function of a posteriori SNR,

Y k |

≡

γ k

(11.4)

E( |

D k |

2 )

and apriori SNR,

2 )

X k

ξ k ≡

(11.5)

E( |

D k |

2 )

2 ) are the statistical variances of the k th spectral

components of the noise and the speech, respectively. The function definition

of the gain G k depends on specific enhancement methods. The a posteriori SNR

γ k in equation (11.4) can be obtained easily as Y k is the input noisy spectrum

and E(

2 ) and E( |

where E( |

D k |

X k |

2 ) can be obtained through a noise adaptation procedure discussed

in Section 11.3. However, the speech variance E(

D k

2 ) for the estimation of

ξ k in equation (11.5) is not available. As a solution, Ephraim and Malah [10]

proposed the decision-directed (DD) method given by,

X k |

α | X (t − 1 )

ξ (t)

α)MAX(γ (t)

2 ) +

( 1

−

1 , 0 )

(11.6)

D (t)

where 0

≤

α< 1and t is the frame index.

Digital Speech: Coding for Low Bit Rate Communication Systems

Search WWH ::

Custom Search

Home