Instructions to use ResembleAI/chatterbox with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Chatterbox
How to use ResembleAI/chatterbox with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Inference
- Notebooks
- Google Colab
- Kaggle
latency - how to get sub 200ms ultra low latency as mentioned
#14
by saketfractal - opened
Firstly, thanks for open-sourcing. I loaded the model on single A40. Thereafter, time taken to run model.generate() was around 9.5 secs for a 12 sec output audio file. The description mentions it can achieve sub 200ms latency. is that for the priced model only? or the stats based on streaming based generation?
The sub 200ms latency is for their paid service
you need to run it in streaming mode, 200ms is the time to first token(TTFT) on more powerful CPU
you need to run it in streaming mode, 200ms is the time to first token(TTFT) on more powerful CPU
how to implement this specially chatterbox doesn't have 'generate_stream()' function