Hopefully a simple question #261

cablerunner · 2025-08-06T11:16:35Z

cablerunner
Aug 6, 2025

I have a python project I am working on that monitors the webcam for facial recognition and captures audio based on the person talking. It then extracts any questions from the conversations and saves those as text. I would like to send those to this through a websocket, but keep getting an interrupted code. Would there be a better way to accomplish this?

cablerunner · 2025-08-07T20:20:18Z

cablerunner
Aug 7, 2025
Author

I found a way to accomplish this by sending the audio file to the /asr, but I need to add some code to the React frontend (maybe??) for the avatar to respond. Is there a better way to accomplish this or anyone know an easier way to do this?

0 replies

t41372 · 2025-08-07T21:06:06Z

t41372
Aug 7, 2025
Maintainer

Sending data to /asr is probably not what you're looking for if you are trying to talk to the ai character in Open-LLM-VTuber. That endpoint is not very useful other then do some transcription.

I have a python project I am working on that monitors the webcam for facial recognition and captures audio based on the person talking. It then extracts any questions from the conversations and saves those as text. I would like to send those to this through a websocket, but keep getting an interrupted code. Would there be a better way to accomplish this?

I'm pretty sure I don't understand what you're trying to say. What kind of data are you attempting to send over websocket? Are you doing something like #245?

Are you attempting to use the existing frontend or make your own?

The reason I ask about the frontend stuff is because the frontend is treated as a client, and if you open another websocket connection, that's another client and another conversation. Messages sent from different client (different websocket connection) will not be considered the same conversation. This allowed multiple users to use the same Open-LLM-VTuber backend and chat in different conversations and different characters at the same time. However, we don't have a great room management system to let multiple things connect to the same conversation. I mean that should be possible (and will be possible), but the code does have to be changed.

So, you should probably send those data to the frontend (or more likely, let the frontend fetch the data stream), and let the frontend compose the request to the backend.

But overall, I don't know what you're trying to do. Are you attempting to send both the audio and some text as one request? The backend, as of now, will treat them as two separate request, because the frontend will only send either text, which is user typing, or audio, which is user speaking. It doesn't make a lot of sense that there will be text and audio at the same time, and because the second request would came to the backend before the first one is done, the first request is interrupted.

If your goal is to add some annotation text along with user's speech audio, we don't support that yet. There are two ways to do this.

Do ASR (speech recognition) in your python facial recognition part, so that the annotation text can be combined with the audio transcript, and we always send text request to the backend. And then you can worry about the frontend problem.
Modifying the backend and the API do that it supports adding annotation text along with audio. We'll still be combining them after we ASR the audio though.

0 replies

cablerunner · 2025-08-08T01:26:43Z

cablerunner
Aug 8, 2025
Author

My project is a modular Python application for facial recognition using TensorFlow and MediaPipe FaceMesh, with audio capture and transcription using a local Whisper model for STT. It includes per-person text file output and an enrollment module for unknown faces. It also have an enrollment feature for unknown faces. I would like to be able to use my code (which I can throw your way if you like) and your framework to have a talking avatar that can use a local model like it does now. Think of this as a Jarvis from Ironman. Anyone being able to walk in a room and have the code do 'X', answer 'Y', and complete a process for 'Z'.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open-LLM-VTuber

Hopefully a simple question #261

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Open-LLM-VTuber

Hopefully a simple question #261

Uh oh!

cablerunner Aug 6, 2025

Replies: 3 comments

Uh oh!

cablerunner Aug 7, 2025 Author

Uh oh!

t41372 Aug 7, 2025 Maintainer

Uh oh!

cablerunner Aug 8, 2025 Author

cablerunner
Aug 6, 2025

cablerunner
Aug 7, 2025
Author

t41372
Aug 7, 2025
Maintainer

cablerunner
Aug 8, 2025
Author