Math @ Duke

Publications [#258057] of David B. Dunson
search www.stat.duke.edu.Papers Published
 Ni, K; Paisley, J; Carin, L; Dunson, D, Multitask learning for analyzing and sorting large databases of sequential data,
IEEE Transactions on Signal Processing, vol. 56 no. 8 II
(2008),
pp. 39183931, ISSN 1053587X [doi]
(last updated on 2018/06/24)
Abstract: A new hierarchical nonparametric Bayesian framework is proposed for the problem of multitask learning (MTL) with sequential data. The models for multiple tasks, each characterized by sequential data, are learned jointly, and the intertask relationships are obtained simultaneously. This MTL setting is used to analyze and sort large databases composed of sequential data, such as music clips. Within each data set, we represent the sequential data with an infinite hidden Markov model (iHMM), avoiding the problem of model selection (selecting a number of states). Across the data sets, the multiple iHMMs are learned jointly in a MTL setting, employing a nested Dirichlet process (nDP). The nDPiHMM MTL method allows simultaneous tasklevel and datalevel clustering, with which the individual iHMMs are enhanced and the betweentask similarities are learned. Therefore, in addition to improved learning of each of the models via appropriate data sharing, the learned sharing mechanisms are used to infer interdata relationships of interest for data search. Specifically, the MTLlearned tasklevel sharing mechanisms are used to define the affinity matrix in a graphdiffusion sorting framework. To speed up the MCMC inference for large databases, the nDPiHMM is truncated to yield a nested Dirichletdistribution based HMM representation, which accommodates fast variational Bayesian (VB) analysis for largescale inference, and the effectiveness of the framework is demonstrated using a database composed of 2500 digital music pieces. © 2008 IEEE.


dept@math.duke.edu
ph: 919.660.2800
fax: 919.660.2821
 
Mathematics Department
Duke University, Box 90320
Durham, NC 277080320

