Suppose there is a learning machine with adjustable
parameters
. Given the above classification task, the machine
will tune its parameters
to learn the mapping
. This will result in a possible mapping
which defines this particular learning machine. The
performance of this machine can be measured by the expectation of test
error, as shown in Eqn.
.
This is called expected risk or actual risk. It requires at least an
estimate of P(x,y), which is not available for most classification
tasks. Hence, one must settle for the empirical risk measure which
is defined in Eqn.
. This is just a measure of the mean
error over the available training data.
Most training algorithms for learning machines implement Empirical Risk
Minimisation (ERM), i.e. minimise the empirical error using
Maximum Likelihood estimation for the parameters
. These
conventional training algorithms do not consider the capacity of the
learning machine and this can result in over fitting, i.e. using a
learning machine with too much capacity for a particular problem.