The above algorithm can be extended to non-separable data. The correct
classification constraints in Eqn.
is revised by adding a
slack variable
to Eqn.
. This will allow some
points to be misclassified (Figure
). The training
algorithm will need to minimise the cost function in
Eqn.
, i.e. a
trade-off between maximum margin and classification error.
The selection of C and k define the cost of constraint violation.
[Cortes and Vapnik, 1995] define a more general error function. The choice of C
and its effect will be discussed in detail in Chapter
.
For positive integers k, the above optimisation problem is a convex
programming problem (if k=1 or 2, it is a QP problem). For
simplicity, k is usually set to 1. Using a similar approach as in the
separable case, the training results in a similar QP problem
(Eqn.
) but now the
value is bound by C,
Eqn.
. This bound will limit the search space for the
QP problem , i.e. the possible range for the
value.
A larger search space will generally slow down the QP
optimiser. Results in Section
(Table
)
show that this is not the case. The optimiser used in this project
(Chapter
) is able to home in to the optimal solution
regardless of the size of the search space.
It can be proved that, for any misclassified training data,
, the
corresponding
must be at the upper bound. This can be
understood by imagining that a particular data point is trying to assert
a stronger influence on the boundary so that it can be classified
correctly, by increasing the
value. When the
value
reaches its maximum bound, it cannot increase its influence further,
hence this point will stay misclassified. This analogy is consistent
with the fact that C, the upper bound for
, is the trade-off
between maximum margin and classification error. A higher C value
will give a larger penalty for classification error, and this means
is allowed to have a larger value; hence, each misclassified
data can assert a stronger influence on the boundary.
Figure: Handling non-separable case with slack variables