Rapid and Robust Environment Aware Processing

Rapid and Robust Environment Aware Processing

[ Description | Personnel | Publications ]

Project Description

One of the fundamental problems with deploying automatic speech recognition (ASR) systems is that they must be able to operate in a wide range of acoustic environments. As the acoustic environment may vary dramatically, for example moving from a quiet office environment to a moving car with high-levels of background noise, it is essential that ASR systems can detect, and adapt to, these changing conditions. The overall aim of the project is to develop approaches that allow ASR systems to respond to changing acoustic conditions, while maintaining high levels of recognition accuracy. The schemes developed should be flexible, in that they should be applicable to a wide range of tasks, for example both small and large vocabulary systems. At the same time the computational load associated with the techniques should be tunable depending on the nature of the environment and the available resources. This project will build on the current research work on Joint Uncertainty Decoding which has been applied to a range of tasks from digit strings (AURORA2) to large vocabulary continuous speech recognition (Broadcast News Transcription).

The research to be carried out may be split into three distinct areas:

Rapid environment adaptation. Two related issues need to be addressed for rapid environment adaptation. First, the estimation of the noise environment must be rapid, both in terms of the time-delay incurred and the environment estimation process itself. Second, having estimated the environment parameters the noise compensation process itself must have minimal computational overhead.
Environment change tracking/detection. This problem may be addressed in two distinct approaches, First, detecting when the environment has changed sufficiently to adversely affect the recognition, and thus warrant updating the model parameters. Second using a scheme which continually monitors and adapts the parameters.
Improved robustness. Though techniques such as uncertainty decoding yield significant gains in performance, there can be unacceptably large degradations in performance as the the signal-to-noise ratio (SNR) decreases. Using HMMs alone is unlikely to address this problem. An alternative is to use discriminative models, such as support vector machines, in combination with HMMs. One initial direction is the code-breaking framework using HMMs in conjunction with SVMs based on JUD-compensated generative kernels.

The project will extend the current implementation of JUD using HTK Version 3.4.

The project is funded by Speech Technology Group, Toshiba Research Europe Ltd and has a three year duration starting in January 2008.

top

Personnel Associated with the Project

Dr Mark Gales [Principal Inverstigator]
Dr Federico Flego [Research Associate]
Rogier van Dalen [Research Student]
Dr Dong Kuk Kim [Visitor]

top

Project and Related Publications

M.J.F. Gales and S.J. Young (1996).
Robust Continuous Speech Recognition using Parallel Model Combination.
IEEE Transactions on Speech and Audio Processing Volume 4.
H. Liao and M.J.F. Gales (2005).
Joint Uncertainty Decoding for Noise Robust Speech Recognition.
InterSpeech 2005.
H. Liao and M.J.F. Gales (2006).
Issues with Uncertainty Decoding for Noise Robust Speech Recognition.
InterSpeech 2006.
M.J.F. Gales and M.I. Layton (2006).
Training Augmented Models using SVMs.
IEICE Special Issue on Statistical Models for Speech Recognition, March 2006.
H. Liao and M.J.F. Gales (2007).
Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noisy Data.
ICASSP 2007.
M.J.F. Gales and R.C. van Dalen (2007).
Predictive Linear Transforms for Noise Robust Speech Recognition.
Presented at ASRU 2007.
R.C. van Dalen and M.J.F. Gales (2008).
Covariance Modelling for Noise-Robust Speech Recognition.
InterSpeech 2008.
M.J.F. Gales and C. Longworth (2008).
Discriminative Classifiers with Generative Kernels for Noise Robust ASR.
InterSpeech 2008.
M.J.F. Gales and F. Flego (2008).
Discriminative Classifiers with Generative Kernels for Noise Robust Speech Recognition.
CUED Technical Report CUED/F-INFENG/TR605, August 2008.
D.K. Kim and M.J.F. Gales (2009).
Noisy CMLLR for Noise-Robust Speech Recognition.
CUED Technical Report CUED/F-INFENG/TR611, February 2009.

top

[ Cambridge University | CUED | MIL | Home]