ADAPTING A HMM-BASED RECOGNISER FOR NOISY SPEECH ENHANCED BY SPECTRAL SUBTRACTION
J. A. Nolazco Flores and S. J. Young
April 1993
Training HMMs on the same conditions as in recognition makes models learn not only the features of the speech, but also those of the environment. Training in the same conditions allows the recognition system to obtain better recognition performance, but trying to have models for all possible environments is impractical. Therefore, one way to solve this problem is to compensate models trained on clean speech to give `artificially' adapted models. The goal of these noise adaptation techniques is to reach the same recognition performance as would be obtained by training in the noisy conditions. Parallel Model Combination (PMC) is one adaptation technique which has been successful in adapting a clean speech model to noise by automatically generating `noisy speech models'.
However, even training in noise can only achieve limited recognition performance because the high variance at low SNR makes the features begin to overlap making the discrimination problem more difficult. The problem is even worse when the vocabulary grows; for example, some experiments have shown that recognition performance is below 80% for 0 dB, even when training and testing were in the same environment. Therefore, in very noisy environments, or when the vocabulary grows, even training in noise is not enough to obtain good recognition performance. In order to improve recognition performance in very noisy environments, some sort of enhancement technique may be useful. An enhancement scheme could improve the SNR, or minimise the variance, or emphasise the main features of the interesting signal. However, all of these improvements are usually at the expense of signal distortion. Minimising both signal distortion and noise, a signal with better features and lower variability is obtained. However, if we want to exploit the good features of the noise adaptation techniques and the good features of the enhancement techniques, then we need to compensate the speech models to the distorted signal. In other words, we need to adapt the models to the enhanced signal.
In this work, we study how to adapt clean speech models for a signal enhanced by Spectral Subtraction (SS). This scheme improves the SNR but at the expense of signal distortion. Nevertheless, this scheme has been successful for signal enhancement, and for speech recognition for noisy environment. Here, the distorted signal is compensated to make SS able to deal with very noisy environments. It will be shown that the signal distortion can be represented in the linear domain by a correction term. PMC transforms the noise and speech model parameters from the cepstral domain to the linear domain, adds these parameters, and then creates an adapted model by returning to the cepstral domain. Therefore, PMC can be modified to compensate an SS distorted signal in the linear domain by including the correction term. This modified version of PMC will be called the SS-PMC method.
The results obtained by the SS-PMC technique are very encouraging, showing that it is very effective to use adaptation techniques to compensate for the signal distortion which is a side effect of an SS-based enhancement scheme.
If you have difficulty viewing files that end '.gz'
,
which are gzip compressed, then you may be able to find
tools to uncompress them at the gzip
web site.
If you have difficulty viewing files that are in PostScript, (ending
'.ps'
or '.ps.gz'
), then you may be able to
find tools to view them at
the gsview
web site.
We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.