Next: Duration Modelling
Up: Special Features
Previous: Special Features
Connectionist model combination refers to the process of merging the outputs of two or more networks. The original motivation for model merging with the hybrid system came from analysis of the recurrent network. Unlike a standard HMM, the recurrent network structure is time asymmetric. Training a network to recognise forward in time will result in different dynamics than training to recognise backwards in time. As different information is available to both processes, it seems reasonable that better modelling can be achieved by combining both information sources.
Significant improvements have been observed by simply averaging the network outputs [27], i.e., setting

where
is the estimate of the kth model. Although this
merging has been successful, the approach is somewhat ad-hoc.
A more principled approach to model merging is based on using the
Kullback-Leibler information as a distance-like measure on multinomial
distributions. Consider the following criterion
where
is the Kullback-Leibler information. Minimisation of E with respect to the distribution p can be interpreted as choosing the distribution which minimises the average (across models) Kullback-Leibler information. Solving the minimisation in (17) results in the log- domain merge of the network outputs, i.e.,
where B is a normalisation constant such that y is a probability distribution. This technique has been applied to merging four networks for large vocabulary speech recognition [28]. The four networks represented forward and backward MEL+ and PLP acoustic preprocessing described in section 3.1. Recognition results are reported in table 1 for three different test sets.
Table 1: Merging results for the ARPA 1993 spoke 5 development test, 1993
spoke 6 development test, and the 1993 hub 2 evaluation test. All tests
utilised a 5,000 word vocabulary and a bigram language model and were
trained using the SI-84 training set.
Whilst the exact gains are task specific, it is generally found that linear merging of four networks provide about 17% fewer errors. The log domain merging performs better with approximately 24% fewer errors when four networks are combined.
Next: Duration Modelling
Up: Special Features
Previous: Special Features