HuangmeiSinger: A Dataset and A Branchformer-Diffusion Model for Huangmei Opera Synthesis

1. Abstract

Singing voice synthesis has been extensively used in metaverse, music creation and entertainment, and cultural preservation and inheritance. However, the synthesis of traditional operas, such as Huangmei opera, has been limited due to the lack of professionally annotated high-quality datasets and appropriate deep learning models. In this work, we develop a Huangmei opera singing voice dataset and propose an acoustic model tailored for the unique singing style of Huangmei opera. More specifically, we first propose a data annotation method tailored for Huangmei opera, effectively addressing the challenges posed by the numerous arias in this art form. Next, we construct our Huangmei opera singing voice dataset with the detailed musical score information, where each singing recording is captured at a high sampling rate of 44.1 kHz. Subsequently, we incorporate the Branchformer encoder and the pitch diffusion module to handle the complex and diverse mel odies characteristic of Huangmei opera. Finally, extensive subjective and objective experiments demonstrate the effectiveness of the proposed dataset and model.

The Dataset Overview. (a) The original sheet music. (b) The singing recording results. (c) The annotation results. The white bold lines indicate syllable boundaries, and the colored dotted lines indicate the boundaries of phonemes in each syllable. Slur stands for continuous sound. (d) The transcription result from pitch to notes and MIDI numbers. (e) The duration result after the alignment process.

Source Code
Dataset

2. Experimental Results

Sample 1: 要发火, 难发火, 自己的学生要保护 (yào fā huǒ, nán fā huǒ, zì jǐ di xué sēng yào bǎo hù)

Recording	FastSpeech2	DiffSinger	VISinger2	Ours

Sample 2: 叫老婆你别啰嗦, 梳什么头来洗什么脸, 换件衣服就哇就算着哇 (jiào lǎo pó nǐ bié luō suo, sū shén me tóu lái xǐ shén me liǎn, huàn jiàn yī fú jiù wa jiù suàn zhuo wa)

Recording	FastSpeech2	DiffSinger	VISinger2	Ours

Sample 3: 李家母闻凶讯气极败坏, 骂一声王桂英做事不该, 你在娘家不习正派, 带奸夫杀我儿所为何来 (lǐ jiā mǔ wén xiōng xùn qì jí bài huài, mà yī shēng wáng guì yīng zuò sì bù gāi, nǐ zài niáng jiā bù xí zhèng pài, dài jiān fū shā wǒ ér suǒ wéi huó lái)

Recording	FastSpeech2	DiffSinger	VISinger2	Ours

Sample 4: 入深宫 (rù shēn gōng)

Recording	FastSpeech2	DiffSinger	VISinger2	Ours

Sample 5: 我骑在驴背乐悠悠哇乐呀嘛乐悠悠喂 (wǒ qí zài lǘ bèi lè yōu yōu wa lè ya ma lè yōu yōu wèi)

Recording	FastSpeech2	DiffSinger	VISinger2	Ours

Sample 6: 闯进我心上 (chuǎng jìn wǒ xīn shàng)

Recording	FastSpeech2	DiffSinger	VISinger2	Ours

3. Ablation study.

Sample 1: 入深宫 (rù shēn gōng)

Full Model	No Branchformer	No Pitch Diffusion

Sample 2: 急忙走急忙行不觉来到柏子桥 (jí máng zǒu jí máng xíng bù jué lái dào bó zi qiáo)

Full Model	No Branchformer	No Pitch Diffusion