A research team led by Prof. XUE Wei, Assistant Professor of the Division of Arts and Machine Creativity (AMC) and Division of Emerging Interdisciplinary Areas (EMIA), unveiled LLaSA-3B, a text-to-speech (TTS) model that marks a significant advancement in voice synthesis technology. Built on the Llama 3.2 framework, the model demonstrates unprecedented capabilities in producing ultra-realistic audio output with emotional expressiveness in both English and Chinese.
Trained on an extensive dataset of 250,000 hours of audio, LLaSA-3B captures intricate speech patterns, accents, and intonations. The model distinguishes itself through advanced emotional expression capabilities, producing speech that naturally conveys happiness, anger, sadness, and whispers. Combined with precise voice cloning abilities and compatibility with existing development tools through its open-weight architecture, it offers a complete package for creating lifelike digital voices.
Prof. Xue’s work represents a significant step forward in making TTS technology more accessible and versatile across various applications, including entertainment, accessibility, customer service, and education. The model is currently available on Hugging Face, allowing developers and researchers to integrate this advanced voice synthesis technology into their applications.
Click here to read more about the news: LLaSA-3B: A Llama 3.2B Fine-Tuned Text-to-Speech Model with Ultra-Realistic Audio, Emotional Expressiveness, and Multilingual Support - MarkTechPost