Prof. Wei XUE shared cutting-edge singing voice synthesis findings at the 18th National Conference on Man-machine Speech Communication and at the 3rd SpeechHome Conference on Speech Technology 2023

Achievements

Community

Corporate Projects

Exchange

Field Trips & Visits

Internship & Career

Undergraduate

Office

Research

Seminars & forums

Student Activities

T&M-DDP

Postgraduate

EVMT

Innovation

Entrepreneurship

Sustainability

Engineering

Environment

Air Quality

GBA

PublicPolicy

ENVR

PPOL

Teaching&Learning

Technology

Research and Technology

Greater Bay Area

IIM

Fintech

Research and Innovation

Prof. Wei XUE shared cutting-edge singing voice synthesis findings at the 18th National Conference on Man-machine Speech Communication and at the 3rd SpeechHome Conference on Speech Technology 2023

19/12/2023

Thumbnail — Prof. Wei XUE, Assistant Professor of the Division of Emerging Interdisciplinary Areas (EMIA), shared his insight of building and improving singing voice model at the 18th National Conference on Man-machine Speech Communication on Dec 8, 2023 and at the 3rd SpeechHome Conference on Speech Technology on Nov 18, 2023.

Prof. Wei XUE, Assistant Professor of the Division of Emerging Interdisciplinary Areas (EMIA), shared his insight of building and improving singing voice model at the 18th National Conference on Man-machine Speech Communication on Dec 8, 2023 and at the 3rd SpeechHome Conference on Speech Technology on Nov 18, 2023.

#researchwithoutboundaries

National Conference on Man-Machine Speech (NCMMSC) is an important stage for domestic experts, scholars and scientific researchers in the field of speech to exchange the latest research results and promote the continuous progress of research and development in this field.

Titled "Building the Singing Voice Foundation Model", Prof. XUE shared the research results by his team of constructing a large model of singing foundation to realize cross-gender, language, range, zero-resource, and fast-generation song synthesis. Unlike traditional AI singers that require hours of training data and a fixed repertoire, this model can support lyrics and tune modification, and can achieve the effect of singing any new song using only a few tens of seconds of data to achieve song synthesis instead of simple conversion.

The SpeechHome Conference on Speech Technology aims to promote voice technology exchanges between industry, academia and research institutes, gain insight into future technological innovation trends, and promote the development of intelligent voice technology in cutting-edge and open-source fields.

In this conference, Prof. XUE identified the existing problems of vocal synthesis, including extreme lack of labelled data, high cost of fine labelling and limited timbre. He further introduced a “High-speed, high-quality, zero-resource vocal synthesis”. With supporting technologies including CoMoSpeech and ZSinger, the diffusion model-based vocal synthesis method can be truly deployable in real-time for industrial-grade applications and allow modeling and lyric/melody control of arbitrary human timbre without labeling data.