Publications [#352312] of Lawrence Carin

	Fitzpatrick Institute for Photonics Pratt School of Engineering Duke University
	HOME > pratt > FIP	Search Help Login

Papers Published

Yang, Q; Huo, Z; Wang, W; Huang, H; Carin, L, Ouroboros: On accelerating training of transformer-based language models, Advances in Neural Information Processing Systems, vol. 32 (January, 2019)
(last updated on 2024/12/31)
Abstract:
Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to language models. We propose the first model-parallel algorithm that speeds the training of Transformer-based language models. We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems. Extensive experiments on Transformer and Transformer-XL language models demonstrate that the proposed algorithm obtains a much faster speedup beyond data parallelism, with comparable or better accuracy. Code to reproduce experiments is to be found at https://github.com/LaraQianYang/Ouroboros.