Abstract for logan_thesis

PhD Thesis, University of Cambridge

ADAPTIVE MODEL-BASED SPEECH ENHANCEMENT

B. T. Logan

Oct 1998

This dissertation details the development and evaluation of techniques to enhance speech corrupted by unknown independent additive noise when only a single microphone is available. It therefore seeks to address a deficiency of many speech enhancement systems which require {\em a priori} knowledge of the interfering noise statistics. Such a deficiency must be corrected if these systems are to operate in real world situations.

The enhancement systems developed are based on an existing system by Ephraim \cite{Ephraim1992Apr}. This approach models the speech and noise statistics using autoregressive hidden Markov models (AR-HMMs). Two main extensions to this technique are developed in order to make it adaptive. The first estimates the noise statistics from detected pauses. The second forms maximum likelihood estimates of the unknown noise parameters using the whole utterance. Both techniques operate within the AR-HMM framework.

Additional work in this dissertation improves the modelling power of AR-HMM systems by incorporating perceptual frequency. The bilinear transform is used to warp the frequency spectrum of the feature vectors to an approximation of the Bark scale. This modification can be incorporated into both AR-HMM recognition and enhancement systems.

The enhancement techniques are evaluated on the NOISEX-92 and Resource Management (RM) databases, giving indications of performance on simple and more complex tasks respectively. Additional experiments investigating the incorporation of perceptual frequency into AR-HMM systems were conducted on the E-set of the speaker independent ISOLET database.

Both enhancement schemes proposed were able to improve substantially on baseline results. The technique of forming maximum likelihood estimates of the noise parameters was found to be the most effective. Its performance was evaluated over a wide range of noise conditions ranging from -6dB to 18dB and on various types of stationary real-world noises.

The incorporation of perceptual frequency into AR-HMM systems was found to increase recognition performance substantially on both the ISOLET and RM databases. The improvement was less marked for the more complex task, highlighting that AR-HMMs could benefit from the inclusion of more variance information.


(ftp:) logan_thesis.ps.Z (http:) logan_thesis.ps.Z
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
  (ftp:) logan_thesis.pdf | (http:) logan_thesis.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.