On the use of Expected Attainable Discrimination for feature
selection in large scale medical risk prediction problems
D. R. Lovell, M. J. J. Scott, M. Niranjan, R. W. Prager,
K. J. Dalton and R. Derom
This report investigates the use of expected attainable
discrimination (EAD) as a measure to select discrete valued features
in two-class prediction problems. In essence, EAD tells us the
performance we could expect to achieve with a simple histogram
probability density model of a given dataset. For discrete valued
features, this kind of density model is bias-free but can have
large variance. Given insufficient training data, such a model's
test set performance will be lower than that of a suitably biased
model. In light of this, we explore the usefulness of EAD for feature
selection.
Keywords: Feature selection, area under receiver
operating characteristic (ROC) curve, medical risk prediction, obstetrics.