Replies: 1 comment
-
|
The computer vision and the "start speaking randomly" (we called it proactive speaking in the frontend) are implemented in |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The project is actually amazing. While as mentioned, the windows install has several issues, it will come together at some point. I've been testing out all of the code and functions, it's by far one of the better if not the best one to come.
I've had some ideas that I'll share down here, hopefully they aren't overwhelming.
If the idea would be considered, it would need to:
This idea is based on the name of the project, "VTuber".
Integrate a small GUI on the website for the AI response, add a button to skip the current dialogue
Have the AI print whatever it's going to say inside that gui. The button could work just like the Voice Interruption option does, simply sending over some empty message to stop generating.
Add a "random" function, where the AI will speak something random. Could be a conversation starter, could be just some random words. This could be implemented into the config with some "Delay" range, where if there's silence, execute the function at some random interval between X and Y minutes or seconds.
Implement a config option that will determine how long the user has to speak before anything happens.
The system listens to the user speaking through a microphone. Once the user stops speaking, the system waits for a predefined period (X seconds) before initiating a response. This delay ensures that the user has a sufficient window to complete their thoughts or add additional comments without being interrupted by the AI.
Edit: Oh, forgot one more point:
Self explanatory. Get some function that will allow the speaker to enable the microphone on button, just like discord does. This probably can't be done through the current web setup, unless the audio input would totally disable/enable based on key state (No idea what option would work best here.)
There are also several advanced ideas that will be harder to implement. Stuff like
This sounds oddly like Neurosama, like fully just from the ideas, but these are the possibilities used in a vast array of other projects on here on github.
Beta Was this translation helpful? Give feedback.
All reactions