next up previous contents
Next: Structural Risk Minimisation Up: The Formulation of Support Previous: The Formulation of Support

Empirical Risk Minimisation

Suppose there is a learning machine with adjustable parameters tex2html_wrap_inline1642 . Given the above classification task, the machine will tune its parameters tex2html_wrap_inline1642 to learn the mapping tex2html_wrap_inline1646 . This will result in a possible mapping tex2html_wrap_inline1648 which defines this particular learning machine. The performance of this machine can be measured by the expectation of test error, as shown in Eqn. gif.

  equation81

This is called expected risk or actual risk. It requires at least an estimate of P(x,y), which is not available for most classification tasks. Hence, one must settle for the empirical risk measure which is defined in Eqn. gif. This is just a measure of the mean error over the available training data.

  equation85

Most training algorithms for learning machines implement Empirical Risk Minimisation (ERM), i.e. minimise the empirical error using Maximum Likelihood estimation for the parameters tex2html_wrap_inline1642 . These conventional training algorithms do not consider the capacity of the learning machine and this can result in over fitting, i.e. using a learning machine with too much capacity for a particular problem.



K.K. Chin
Thu Sep 10 11:05:30 BST 1998