"It's important," writes Simon Willison, "that everyone understands that voice cloning is now something that's available to anyone with a GPU and a few GBs of VRAM... or in this case a web browser that can access Hugging Face." The model is described in the paper (14 page PDF) though honestly the paper is pretty incomprehensible (though I suppose with more study than I can devote to it the paper would be fascinating). " Qwen3-TTS supports state-of-the-art 3-second voice cloning and description-based control, allowing both the creation of entirely novel voices and fine-grained manipulation over the output speech." I gave it a quick test here and it performed quite well.
Today: Total: [] [Share]

