Math @ Duke

Publications [#323271] of Sayan Mukherjee
Papers Published
 Zhao, S; Gao, C; Mukherjee, S; Engelhardt, BE, Bayesian group factor analysis with structured sparsity,
Journal of machine learning research : JMLR, vol. 17
(April, 2016),
pp. 147
(last updated on 2017/04/01)
Abstract: © 2016 Shiwen Zhao, Chuan Gao, Sayan Mukherjee, and Barbara E. Engelhardt.Latent factor models are the canonical statistical tool for exploratory analyses of lowdimensional linear structure for a matrix of p features across n samples. We develop a structured Bayesian group factor analysis model that extends the factor model to multiple coupled observation matrices; in the case of two observations, this reduces to a Bayesian model of canonical correlation analysis. Here, we carefully dene a structured Bayesian prior that encourages both elementwise and columnwise shrinkage and leads to desirable behavior on highdimensional data. In particular, our model puts a structured prior on the joint factor loading matrix, regularizing at three levels, which enables elementwise sparsity and unsupervised recovery of latent factors corresponding to structured variance across arbitrary subsets of the observations. In addition, our structured prior allows for both dense and sparse latent factors so that covariation among either all features or only a subset of features can be recovered. We use fast parameterexpanded expectationmaximization for parameter estimation in this model. We validate our method on simulated data with substantial structure. We show results of our method applied to three highdimensional data sets, comparing results against a number of stateofTheArt approaches. These results illustrate useful properties of our model, including i) recovering sparse signal in the presence of dense effects; ii) the ability to scale naturally to large numbers of observations; iii) exible observationand factorspecific regularization to recover factors with a wide variety of sparsity levels and percentage of variance explained; and iv) tractable inference that scales to modern genomic and text data sizes.


dept@math.duke.edu
ph: 919.660.2800
fax: 919.660.2821
 
Mathematics Department
Duke University, Box 90320
Durham, NC 277080320

