VC confidence is defined as the second term on the right hand side of
Eqn.
.
is chosen such that
,
and the bound in Eqn.
will hold with probability
. h is the VC dimension and is a measure of the notion of
capacity of a learning machine.
VC dimension can be defined for various classes of learning
machines. It is defined as the maximum number of points that can be
``shatter'' by the machine. Considering the two-class problem as defined
in the beginning of this chapter, a set of l points have
possible label
assignments. If a learning machine with the mapping
can
correctly assign all possible labels, then that set of points
is shattered by this learning machine. It can be shown that a machine
with oriented hyper-planes in
as its mapping function, has a
VC dimension of N+1.
Figure
shows that using these bound on the
expected risk. It is clear that a learning machine with large capacity
(hence large VC dimension) will give low empirical risk but the VC
confidence interval is also large for that learning machine, i.e the
machine does not generalise well. By measuring this bound, one can
select a learning machine that give the lowest expected risk. The
selected machine will give better generalisation.
Figure: The bound on actual risk of a classifier or learning machine
In [Vapnik, 1995], it is stressed that the VC dimension and hence the
capacity does not depend on the number of free parameters directly,
i.e. a learning machine with a larger number of free parameter need not
have larger capacity. And the reverse is also true. The machine with
mapping function
, where
has an infinite VC dimension.