Using upper bounds on attainable discrimination to select discrete valued features
D. R. Lovell, C. R. Dance, M. Niranjan, R. W. Prager and K. J. Dalton
Selection of features that will permit accurate pattern
classification is, in general, a difficult task. However, if a
particular data set is represented by discrete valued features, it
becomes possible to determine empirically the contribution that each
feature makes to the discrimination between classes. We describe how
to calculate the maximum discrimination possible in a two alternative
forced choice decision problem, when discrete valued features are used
to represent a given data set. (In this paper, we measure
discrimination in terms of the area under the receiver operating
characteristic (ROC) curve.) Since this bound corresponds to the
upper limit of classification achievable by any classifier (with that
given data representation), we can use it to assess whether
recognition errors are due to a lack of separability in the data or
shortcomings in the classification technique. In comparison to the
training and testing of artificial neural networks, the empirical
bound on discrimination can be efficiently calculated, allowing an
experimenter to decide whether subsequent development of neural
network models is warranted.
We extend the discrimination bound method so that we can estimate both
the maximum and average discrimination we can expect on unseen test
data. These estimation techniques are the basis of a backwards
elimination algorithm that can be used to rank features in order of
their discriminative power. We use two problems to demonstrate this
feature selection process: classification of the Mushroom Database,
and a real-world, pregnancy related medical risk prediction task ---
assessment of risk of perinatal death.