Hopefully a simple question #261
Replies: 3 comments
-
|
I found a way to accomplish this by sending the audio file to the /asr, but I need to add some code to the React frontend (maybe??) for the avatar to respond. Is there a better way to accomplish this or anyone know an easier way to do this? |
Beta Was this translation helpful? Give feedback.
-
|
Sending data to
I'm pretty sure I don't understand what you're trying to say. What kind of data are you attempting to send over websocket? Are you doing something like #245? Are you attempting to use the existing frontend or make your own? The reason I ask about the frontend stuff is because the frontend is treated as a client, and if you open another websocket connection, that's another client and another conversation. Messages sent from different client (different websocket connection) will not be considered the same conversation. This allowed multiple users to use the same Open-LLM-VTuber backend and chat in different conversations and different characters at the same time. However, we don't have a great room management system to let multiple things connect to the same conversation. I mean that should be possible (and will be possible), but the code does have to be changed. So, you should probably send those data to the frontend (or more likely, let the frontend fetch the data stream), and let the frontend compose the request to the backend. But overall, I don't know what you're trying to do. Are you attempting to send both the audio and some text as one request? The backend, as of now, will treat them as two separate request, because the frontend will only send either text, which is user typing, or audio, which is user speaking. It doesn't make a lot of sense that there will be text and audio at the same time, and because the second request would came to the backend before the first one is done, the first request is interrupted. If your goal is to add some annotation text along with user's speech audio, we don't support that yet. There are two ways to do this.
|
Beta Was this translation helpful? Give feedback.
-
|
My project is a modular Python application for facial recognition using TensorFlow and MediaPipe FaceMesh, with audio capture and transcription using a local Whisper model for STT. It includes per-person text file output and an enrollment module for unknown faces. It also have an enrollment feature for unknown faces. I would like to be able to use my code (which I can throw your way if you like) and your framework to have a talking avatar that can use a local model like it does now. Think of this as a Jarvis from Ironman. Anyone being able to walk in a room and have the code do 'X', answer 'Y', and complete a process for 'Z'. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a python project I am working on that monitors the webcam for facial recognition and captures audio based on the person talking. It then extracts any questions from the conversations and saves those as text. I would like to send those to this through a websocket, but keep getting an interrupted code. Would there be a better way to accomplish this?
Beta Was this translation helpful? Give feedback.
All reactions