latency - how to get sub 200ms ultra low latency as mentioned

#14

by saketfractal - opened May 30, 2025

May 30, 2025

Firstly, thanks for open-sourcing. I loaded the model on single A40. Thereafter, time taken to run model.generate() was around 9.5 secs for a 12 sec output audio file. The description mentions it can achieve sub 200ms latency. is that for the priced model only? or the stats based on streaming based generation?

kth8

May 30, 2025

•

edited May 30, 2025

The sub 200ms latency is for their paid service

weiywang

Aug 14, 2025

you need to run it in streaming mode, 200ms is the time to first token(TTFT) on more powerful CPU

Mohamed-Aziz-Elsherbiny

Feb 18

you need to run it in streaming mode, 200ms is the time to first token(TTFT) on more powerful CPU

how to implement this specially chatterbox doesn't have 'generate_stream()' function

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment