Given *N* samples of speech, we would like to compute estimates to
that result in the best fit. One reasonable way to define ``best fit''
is in terms of mean squared error. These can also be regarded as ``most
probable'' parameters if it is assumed the distribution of errors is
Gaussian and a priori there were no restrictions on the values of .

The error at any time, , is:

Hence the summed squared error, *E*, over a finite window of length *N* is:

The minimum of *E* occurs when the derivative is zero with respect to
each of the parameters, . As can be seen from
equation 67 the value of *E* is quadratic in each of the
therefore there is a single solution. Very large positive or
negative values of must lead to poor prediction and hence the
solution to must be a minimum.

**Figure 38:** Schematic showing single minimum of a quadratic

Hence differentiating equation 67 with respect to
and setting equal to zero gives the set of *p* equations:

rearranging equation 69 gives:

Define the covariance matrix, with elements :

Now we can write equation 70 as:

or in matrix form:

or simply:

Hence the *Covariance method* solution is obtained by matrix
inverse:

Note that is symmetric, i.e. , and that this symmetry can be expoited in inverting (see [9]).

These equations reference the samples .

Speech Vision Robotics group/Tony Robinson