HuangmeiSinger: A Dataset and A Branchformer-Diffusion Model for Huangmei Opera Synthesis
1. Abstract
Singing voice synthesis has been extensively used in metaverse, music creation and entertainment, and cultural preservation and inheritance. However, the synthesis of traditional operas, such as Huangmei opera, has been limited due to the lack of professionally annotated high-quality datasets and appropriate deep learning models. In this work, we develop a Huangmei opera singing voice dataset and propose an acoustic model tailored for the unique singing style of Huangmei opera. More specifically, we first propose a data annotation method tailored for Huangmei opera, effectively addressing the challenges posed by the numerous arias in this art form. Next, we construct our Huangmei opera singing voice dataset with the detailed musical score information, where each singing recording is captured at a high sampling rate of 44.1 kHz. Subsequently, we incorporate the Branchformer encoder and the pitch diffusion module to handle the complex and diverse mel odies characteristic of Huangmei opera. Finally, extensive subjective and objective experiments demonstrate the effectiveness of the proposed dataset and model.
Model ArchitectureThe Dataset Overview. (a) The original sheet music. (b) The singing recording results. (c) The annotation results. The white bold lines indicate syllable boundaries, and the colored dotted lines indicate the boundaries of phonemes in each syllable. Slur stands for continuous sound. (d) The transcription result from pitch to notes and MIDI numbers. (e) The duration result after the alignment process.