Math @ Duke

Publications [#257990] of David B. Dunson
search www.stat.duke.edu.Papers Published
 Chen, B; Chen, M; Paisley, J; Zaas, A; Woods, C; Ginsburg, GS; Hero, A; Lucas, J; Dunson, D; Carin, L, Bayesian inference of the number of factors in geneexpression analysis: application to human virus challenge studies.,
BMC Bioinformatics, vol. 11
(November, 2010),
pp. 552 [21062443], [doi]
(last updated on 2018/05/25)
Abstract: BACKGROUND: Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to geneexpression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparsenesspromoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. RESULTS: Timeevolving geneexpression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of performing nonparametric factor analysis on these data, with comparisons as well to sparsePCA and Penalized Matrix Decomposition (PMD), closely related nonBayesian approaches. CONCLUSIONS: Applying the Beta Process to the factor scores, or to the singular values of a pseudoSVD construction, the proposed algorithms infer the number of factors in geneexpression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparsePCA and PMD. We have also identified a "panviral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this panviral factor, of interest for early detection of such viruses based upon the host response, as quantified via geneexpression data.


dept@math.duke.edu
ph: 919.660.2800
fax: 919.660.2821
 
Mathematics Department
Duke University, Box 90320
Durham, NC 277080320

