Some ideas to crunch on. #34

T9es · 2024-11-08T18:21:19Z

T9es
Nov 8, 2024

The project is actually amazing. While as mentioned, the windows install has several issues, it will come together at some point. I've been testing out all of the code and functions, it's by far one of the better if not the best one to come.

I've had some ideas that I'll share down here, hopefully they aren't overwhelming.

Option to integrate twitch/youtube or other stream APIs (For chat reading).
If the idea would be considered, it would need to:

Take into account WHO'S talking to it (Understand that the one speaking through the microphone isn't the one writing in chat)
Have some basic or advanced logic on reading messages (Picking at random with basic logic and picking the most interesting one in advanced logic. This would require probably some second layer LLM node to pick the best ones. Could also build a "queue" of messages it will try to respond to.)
Have an option to disable chat reading in the UI.
Ability to integrate MULTIPLE APIs at once (Since multistreaming is getting popular.)

This idea is based on the name of the project, "VTuber".

Integrate a small GUI on the website for the AI response, add a button to skip the current dialogue
Have the AI print whatever it's going to say inside that gui. The button could work just like the Voice Interruption option does, simply sending over some empty message to stop generating.
Add a "random" function, where the AI will speak something random. Could be a conversation starter, could be just some random words. This could be implemented into the config with some "Delay" range, where if there's silence, execute the function at some random interval between X and Y minutes or seconds.
Implement a config option that will determine how long the user has to speak before anything happens.
The system listens to the user speaking through a microphone. Once the user stops speaking, the system waits for a predefined period (X seconds) before initiating a response. This delay ensures that the user has a sufficient window to complete their thoughts or add additional comments without being interrupted by the AI.

Edit: Oh, forgot one more point:

Enable microphone on button.
Self explanatory. Get some function that will allow the speaker to enable the microphone on button, just like discord does. This probably can't be done through the current web setup, unless the audio input would totally disable/enable based on key state (No idea what option would work best here.)

There are also several advanced ideas that will be harder to implement. Stuff like

Voice recognition (for several users)
Content adaptation (Adapt the content based on what's being said/played.)
Enable google or some other engine search possibility
Implement Computer Vision

This sounds oddly like Neurosama, like fully just from the ideas, but these are the possibilities used in a vast array of other projects on here on github.

t41372 · 2025-02-06T10:32:56Z

t41372
Feb 6, 2025
Maintainer

The computer vision and the "start speaking randomly" (we called it proactive speaking in the frontend) are implemented in v1.0.0.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open-LLM-VTuber

Some ideas to crunch on. #34

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Open-LLM-VTuber

Some ideas to crunch on. #34

Uh oh!

Uh oh!

T9es Nov 8, 2024

Replies: 1 comment

Uh oh!

t41372 Feb 6, 2025 Maintainer

T9es
Nov 8, 2024

t41372
Feb 6, 2025
Maintainer