Department of Mathematics
 Search | Help | Login

Math @ Duke





.......................

.......................


Publications [#385431] of Cynthia D. Rudin

Papers Published

  1. Babbar, V; Guo, Z; Rudin, C, “What is Different Between These Datasets?” A Framework for Explaining Data Distribution Shifts, Journal of Machine Learning Research, vol. 26 (January, 2025)
    (last updated on 2026/01/15)

    Abstract:
    The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-related challenges. A common issue arises when curating training data or deploying models: two datasets from the same domain may exhibit differing distributions. While many techniques exist for detecting such distribution shifts, there is a lack of comprehensive methods to explain these differences in a human-understandable way beyond opaque quantitative metrics. To bridge this gap, we propose a versatile framework of interpretable methods for comparing datasets. Using a variety of case studies, we demonstrate the effectiveness of our approach across diverse data modalities—including tabular data, text data, images, time-series signals – in both low and high-dimensional settings. These methods complement existing techniques by providing actionable and interpretable insights to better understand and address distribution shifts.

 

dept@math.duke.edu
ph: 919.660.2800
fax: 919.660.2821

Mathematics Department
Duke University, Box 90320
Durham, NC 27708-0320


x