© 2018, © 2018 American Statistical Association. Studying the neurological, genetic, and evolutionary basis of human vocal communication mechanisms using animal vocalization models is an important field of neuroscience. The datasets typically comprise structured sequences of syllables or “songs” produced by animals from different genotypes under different social contexts. It has been difficult to come up with sophisticated statistical methods that appropriately model animal vocal communication syntax. We address this need by developing a novel Bayesian semiparametric framework for inference in such datasets. Our approach is built on a novel class of mixed effects Markov transition models for the songs that accommodate exogenous influences of genotype and context as well as animal-specific heterogeneity. Crucial advantages of the proposed approach include its ability to provide insights into key scientific queries related to global and local influences of the exogenous predictors on the transition dynamics via automated tests of hypotheses. The methodology is illustrated using simulation experiments and the aforementioned motivating application in neuroscience. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.