Finally the most important question is, ``does SVM training actually
implement SRM principle ?'' and if yes, ``how ?'' i.e. can one prove that
SVM training actually minimises the VC dimension and empirical error
at the same time. From Section
, the VC dimension of a
nonlinear SVM
is N +1, where N is the dimensionality of space
. In [Burges, 1998], the dimensionality of space
for a p-degree polynomial and RBF kernel is shown to be
and
respectively, where
is the
dimensionality of the original low dimension space. Hence SVM could
have a very high VC dimension.
[Vapnik, 1995] stated that the VC dimension is bounded by the inequality
in Eqn.
. The influence of A on the capacity of
the SVM classifier is shown in Figure
.
All training vectors are bounded by a sphere of radius R, and
norm of w (Eqn.
) is bound by A, i.e.
.
R can be found by first estimating the centre of the sphere that
encloses all the training data. This is a convex QP problem, which can
be solve using very similar techniques to those used in the SVM training,
i.e. using Lagrangian and prima-dual formulation. [Burges, 1998]
shows that the centre is given by Eqn.
(
is the
Lagrange multiplier). Then R is computed using Eqn.
.
Figure: Constraining the hyper-planes to remain outside the spheres of
radius
around each data point
By analysing the VC dimension of a gap-tolerant classifier
,
[Burges, 1998] claimed that Eqn.
actually gives
the VC dimension of a gap-tolerant classifier but not the VC dimension
of a SVM classifier.