GLM-TTS: The New Open-Source Benchmark for Emotional Zero-Shot Chinese TTS Core question most developers are asking in late 2025: Is there finally a fully open-source TTS that can clone any voice with 3–10 seconds of audio, sound emotional, stream in real-time, and handle Chinese polyphones accurately? The answer is yes — and it launched today. On December 11, 2025, Zhipu AI open-sourced GLM-TTS: a production-ready, zero-shot, emotionally expressive text-to-speech system that is currently the strongest open-source Chinese TTS available. Image credit: Official repository Why GLM-TTS Changes Everything — In Four Bullet Points Zero-shot voice cloning: 3–10 s reference audio is …
Supertonic: The Lightning-Fast, Fully On-Device TTS That Actually Works in 2025 Core Question: What exactly is Supertonic, and why is it running 100–167× faster than real-time on a laptop or phone — completely offline? Supertonic is a 66-million-parameter text-to-speech (TTS) model released by Supertone in 2025. Built for extreme on-device performance and powered by ONNX Runtime, it runs 100% locally on everything from smartphones to browsers — no cloud, no API keys, no privacy trade-offs. With just 2 inference steps it already sounds production-ready, and on Apple M4 Pro it hits an insane 167× real-time speed. Why Supertonic Changes Everything: …
WenetSpeech-Yue: A Large-Scale Cantonese Speech Corpus with Multi-Dimensional Annotation Why Cantonese Speech Processing Demands Large-Scale Annotated Resources Cantonese, spoken by approximately 84.9 million native speakers worldwide, presents unique challenges for speech processing due to its rich tone system of nine tones in six categories, coexistence of literary and colloquial forms, and frequent code-switching with English. Despite its linguistic complexity and cultural significance, Cantonese has remained severely under-resourced in speech technology compared to major languages. The development of WenetSpeech-Yue addresses this critical gap by providing the largest open-source Cantonese speech corpus with comprehensive multi-dimensional annotations. The WenetSpeech-Pipe Framework: Building High-Quality Speech …