Next: Duration Modelling Up: Special Features Previous: Special Features

Connectionist Model Combination

 

Connectionist model combination refers to the process of merging the outputs of two or more networks. The original motivation for model merging with the hybrid system came from analysis of the recurrent network. Unlike a standard HMM, the recurrent network structure is time asymmetric. Training a network to recognise forward in time will result in different dynamics than training to recognise backwards in time. As different information is available to both processes, it seems reasonable that better modelling can be achieved by combining both information sources.

Significant improvements have been observed by simply averaging the network outputs [27], i.e., setting

where is the estimate of the kth model. Although this merging has been successful, the approach is somewhat ad-hoc. A more principled approach to model merging is based on using the Kullback-Leibler information as a distance-like measure on multinomial distributions. Consider the following criterion

 

where

 

is the Kullback-Leibler information. Minimisation of E with respect to the distribution p can be interpreted as choosing the distribution which minimises the average (across models) Kullback-Leibler information. Solving the minimisation in (17) results in the log- domain merge of the network outputs, i.e.,

 

where B is a normalisation constant such that y is a probability distribution. This technique has been applied to merging four networks for large vocabulary speech recognition [28]. The four networks represented forward and backward MEL+ and PLP acoustic preprocessing described in section 3.1. Recognition results are reported in table 1 for three different test sets.

  
Table 1: Merging results for the ARPA 1993 spoke 5 development test, 1993 spoke 6 development test, and the 1993 hub 2 evaluation test. All tests utilised a 5,000 word vocabulary and a bigram language model and were trained using the SI-84 training set.

Whilst the exact gains are task specific, it is generally found that linear merging of four networks provide about 17% fewer errors. The log domain merging performs better with approximately 24% fewer errors when four networks are combined.

Next: Duration Modelling Up: Special Features Previous: Special Features


Tony Robinson Sun Jun 4 20:04:56 BST 1995