Department of Mathematics
 Search | Help | Login | pdf version | printable version

Math @ Duke





.......................

.......................


Publications [#369689] of John Harer

Papers Published

  1. Smith, AD; Bendich, P; Harer, J, PERSISTENT OBSTRUCTION THEORY FOR A MODEL CATEGORY OF MEASURES WITH APPLICATIONS TO DATA MERGING, Transactions of the American Mathematical Society Series B, vol. 8 no. 1 (February, 2021), pp. 1-38 [doi]
    (last updated on 2024/04/23)

    Abstract:
    Collections of measures on compact metric spaces form a model category (“data complexes”), whose morphisms are marginalization integrals. The fibrant objects in this category represent collections of measures in which there is a measure on a product space that marginalizes to any measures on pairs of its factors. The homotopy and homology for this category allow measurement of obstructions to finding measures on larger and larger product spaces. The obstruction theory is compatible with a fibrant filtration built from the Wasserstein distance on measures. Despite the abstract tools, this is motivated by a widespread problem in data science. Data complexes provide a mathematical foundation for semi-automated data-alignment tools that are common in commercial database software. Practically speaking, the theory shows that database JOIN operations are subject to genuine topological obstructions. Those obstructions can be detected by an obstruction cocycle and can be resolved by moving through a filtration. Thus, any collection of databases has a persistence level, which measures the difficulty of JOINing those databases. Because of its general formulation, this persistent obstruction theory also encompasses multi-modal data fusion problems, some forms of Bayesian inference, and probability couplings.

 

dept@math.duke.edu
ph: 919.660.2800
fax: 919.660.2821

Mathematics Department
Duke University, Box 90320
Durham, NC 27708-0320