Prof. XUE Wei’s Research Team Achieves Breakthrough in Voice Synthesis Technology with LLaSA-3B

Achievements

Community

Corporate Projects

Exchange

Field Trips & Visits

Internship & Career

Undergraduate

Office

Research

Seminars & forums

Student Activities

T&M-DDP

Postgraduate

EVMT

Innovation

Entrepreneurship

Sustainability

Engineering

Environment

Air Quality

GBA

PublicPolicy

ENVR

PPOL

Teaching&Learning

Technology

Research and Technology

Greater Bay Area

IIM

Fintech

Research and Innovation

Prof. XUE Wei’s Research Team Achieves Breakthrough in Voice Synthesis Technology with LLaSA-3B

03/02/2025

A research team led by Prof. XUE Wei, Assistant Professor of the Division of Arts and Machine Creativity (AMC) and Division of Emerging Interdisciplinary Areas (EMIA), unveiled LLaSA-3B, a text-to-speech (TTS) model that marks a significant advancement in voice synthesis technology. Built on the Llama 3.2 framework, the model demonstrates unprecedented capabilities in producing ultra-realistic audio output with emotional expressiveness in both English and Chinese.

Trained on an extensive dataset of 250,000 hours of audio, LLaSA-3B captures intricate speech patterns, accents, and intonations. The model distinguishes itself through advanced emotional expression capabilities, producing speech that naturally conveys happiness, anger, sadness, and whispers. Combined with precise voice cloning abilities and compatibility with existing development tools through its open-weight architecture, it offers a complete package for creating lifelike digital voices.

Prof. Xue’s work represents a significant step forward in making TTS technology more accessible and versatile across various applications, including entertainment, accessibility, customer service, and education. The model is currently available on Hugging Face, allowing developers and researchers to integrate this advanced voice synthesis technology into their applications.

Click here to read more about the news: LLaSA-3B: A Llama 3.2B Fine-Tuned Text-to-Speech Model with Ultra-Realistic Audio, Emotional Expressiveness, and Multilingual Support - MarkTechPost