Math @ Duke

Publications [#326885] of Robert Calderbank
Papers Published
 Nokleby, M; Beirami, A; Calderbank, R, Ratedistortion bounds on Bayes risk in supervised learning,
IEEE International Symposium on Information Theory  Proceedings, vol. 2016August
(August, 2016),
pp. 20992103, ISBN 9781509018062 [doi]
(last updated on 2018/03/21)
Abstract: © 2016 IEEE. An informationtheoretic framework is presented for estimating the number of labeled samples needed to train a classifier in a parametric Bayesian setting. Ideas from ratedistortion theory are used to derive bounds for the average L 1 or L ∞ distance between the learned classifier and the true maximum a posteriori classifier in terms of familiar informationtheoretic quantities and the number of training samples available. The maximum a posteriori classifier is viewed as a random source, labeled training data are viewed as a finiterate encoding of the source, and the L 1 or L ∞ Bayes risk is viewed as the average distortion. The result is a framework dual to the wellknown probably approximately correct (PAC) framework. PAC bounds characterize worstcase learning performance of a family of classifiers whose complexity is captured by the VapnikChervonenkis (VC) dimension. The ratedistortion framework, on the other hand, characterizes the averagecase performance of a family of data distributions in terms of a quantity called the interpolation dimension, which represents the complexity of the family of data distributions. The resulting bounds do not suffer from the pessimism typical of the PAC framework, particularly when the training set is small.


dept@math.duke.edu
ph: 919.660.2800
fax: 919.660.2821
 
Mathematics Department
Duke University, Box 90320
Durham, NC 277080320

