This paper presents a new text to mel-spectrogram synthesis model, ‘Flowtron,’ based on autoregressive flows that is optimized by maximizing the likelihood and allows for control of speech variation and style transfer.

Flowtron combines data and insights from IAF and optimizes Tacotron 2 to provide a high-quality and controllable mel-spectrogram synthesis.

The paper finally explains how the were experiments done in this research (FlowTron) gives the user the possibility to transfer characteristics from a source sample or speaker to a target speaker, for example, making a monotonic speaker sound more expressive.

Paper: https://arxiv.org/abs/2005.05957

Project: https://nv-adlr.github.io/Flowtron

Codes will be available soon: https://github.com/NVIDIA/flowtron