Text streaming support #5

Shulyaka · 2024-01-17T14:52:54Z

It would be good to support chunked text the same way we support chucked audio. The reason is that LLMs produce the text token-by-token, and when the text is big, we would like to start producing the audio via tts right away instead of waiting.

sdetweil · 2024-01-17T21:12:47Z

well, i fiddled with mine to do that..

and it needs a redesign. can't wait for the response from a chunk, so have to spin off a async task to handle the waits, and if there is text, send it some place to consolidate with prior and maybe signal done
and then on the send side, don't know what happens if the handler blocks while transcribing. does it hold up the next block arriving> buffering.. so one would have to spin off another async thread with a queue to handle the transcribes.. and sends..
and figure out all the audio data alignment to all the interim transcribes.

sdetweil · 2024-01-18T16:19:28Z

so I modified my asr to do interim results, on the fly.. but as suspected it will taks some work to figure out what to do with the audio data..

currenlty for test, if the transcriber returns text (not '') then I send that back and drop the audio input saved to here.
effectively starting over... BUT.. this truncates some of the text response..

should be testing testing testing testing

but got test test Washigton testing test with a lot of no text responses in between

returned text=
returned text=
returned text=
returned text=
returned text=
returned text=
returned text=test
returned text=
returned text=
returned text=
returned text=test
returned text=
returned text=
returned text=Washington
returned text=
returned text=
returned text=testing
returned text=
returned text=
returned text=test
returned text=
returned text=
returned text=
returned text=
returned text=
returned text=
returned text=
returned text=
returned text=
returned text=
returned text=

I don't know what my transcriber does under the covers..

synesthesiam · 2024-01-18T21:25:58Z

I think this could be done with the appropriate start/stop/chunk events. So for ASR/STT response, it could be TranscriptStart, TranscriptStop, TranscriptChunk. This way, the server would be able to differentiate it well from the original Transcript which is the whole thing at once.

sdetweil · 2024-01-19T01:37:34Z

transcript is at the end
transcribe is at the start

I think another Parm on the Transcribe would indicate that the client is enabled for interim results.

transcriptchunk implies the client is processing the chunks somehow
but it's streaming from the mic non stop

it's unlikely that every client would change.

the current whisper sends the results on audio stop, not transcript anyhow.

sdetweil · 2024-01-19T01:40:57Z

but that doesn't tell the client if the server will send interim results. currently Transcribe doesn't have a response

sdetweil · 2024-01-21T02:00:43Z

Maybe we could use the Describe Info response to indicate if the asr supports intermediate responses

then I suppsose a new TranscriptChunk event out from the asr would inform the client

sdetweil · 2024-11-26T18:55:21Z

I did it with a new parm on Transcript...
#33

doesn't help on the start to know if the event receiver can handle that or if its wasted energy
so Transcript needs a parm to allow requesting partials.. even without knowing IF the stt can do that (which is back on Describe)

I'll add that to #33 for test
adding the sendPartials:bool/False property to Transcribe supports this if the target service can and is enabled. ignored if not.

one has to be sure the receiver can recover from not receiving the 'end' event (AudioStop in asr) as it will have sent the Transcript/final unsolicited

but streaming text to TTS will take a couple changes . the synthesize will need some id/timestamp to synch w others , and if not another event , then a continued:bool/false to indicated with its id that it is more text
both optional

so repeat Synthesize, id=same, continued=true
til last block, id=same, continued=false

synesthesiam added the enhancement New feature or request label Jan 18, 2024

synesthesiam self-assigned this Jan 18, 2024

sdetweil mentioned this issue Feb 19, 2024

consider implementing returning partial resullts sdetweil/wyoming-google#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text streaming support #5

Text streaming support #5

Shulyaka commented Jan 17, 2024

sdetweil commented Jan 17, 2024 •

edited

Loading

sdetweil commented Jan 18, 2024

synesthesiam commented Jan 18, 2024

sdetweil commented Jan 19, 2024

sdetweil commented Jan 19, 2024

sdetweil commented Jan 21, 2024 •

edited

Loading

sdetweil commented Nov 26, 2024 •

edited

Loading

Text streaming support #5

Text streaming support #5

Comments

Shulyaka commented Jan 17, 2024

sdetweil commented Jan 17, 2024 • edited Loading

sdetweil commented Jan 18, 2024

synesthesiam commented Jan 18, 2024

sdetweil commented Jan 19, 2024

sdetweil commented Jan 19, 2024

sdetweil commented Jan 21, 2024 • edited Loading

sdetweil commented Nov 26, 2024 • edited Loading

sdetweil commented Jan 17, 2024 •

edited

Loading

sdetweil commented Jan 21, 2024 •

edited

Loading

sdetweil commented Nov 26, 2024 •

edited

Loading