Adding TTS options

kkacsh321 · Oct 12, 2024 · 3ead49b · 3ead49b
1 parent 3948c1b
commit 3ead49b
Show file tree

Hide file tree

Showing 11 changed files with 306 additions and 14 deletions.
diff --git a/.gitignore b/.gitignore
@@ -5,3 +5,4 @@
 .DS_Store
 ./models
 models/*
+temp_audio.wav
diff --git a/README.md b/README.md
@@ -12,9 +12,15 @@ Interact with a hosted version of this app live at [<https://robotf.ai/Halloween
 - [Features](#features-️🕯️)
 - [Getting Started](#getting-started-🧹)
   - [Docker Compose with LocalAI](#option-1-local-ai-with-docker-compose-🖤)
-    - [Docker from DockerHub](#option-2-docker-hub-container-👻)
-    - [Direct Python Development](#option-3-local-development-👨‍💻)
+  - [Docker from DockerHub](#option-2-docker-hub-container-👻)
+  - [Direct Python Development](#option-3-local-development-👨‍💻)
+- [Running the App](#running-the-app)
+  - [OpenAI](#openai)
+  - [Docker Compose](#docker-compose)
+  - [Using LocalAI/LMStudio/Ollama/etc locally](#using-localailmstudioollamaetc-locally)
+  - [Using a custom endpoint URL](#using-a-custom-endpoint-url)
 - [Development Setup](#development-setup)
+- [Text to Speech](#text-to-speech)
 - [Contact](#contact)
 - [Contributing](#contributing-👥)
 - [License](#license-📜)
@@ -24,17 +30,28 @@ Interact with a hosted version of this app live at [<https://robotf.ai/Halloween
 
 Welcome to the eerie realm of the Spooky Streamlit Storyteller! This is no ordinary codebase; it's a haunted mansion of horror stories, where AI and LLMs (Large Language Models) come together to weave chilling tales that will send shivers down your spine. If you're brave enough to conjure up a streaming app with Streamlit that generates spooky Halloween stories, you've just unlocked the creaky front door.
 
+In reality this is just an example of how to integrate LLM's with Streamlit using python, langchain, requests, and even LocalAI (if you don't want to waste money on OpenAI credits.). This is just a demo to show people what is possible.
+
 ## About the Project 👻
 
 This project is a digital ouija board, channeling the supernatural power of AI to craft horror stories that are as dynamic as they are dreadful. With Streamlit's enchanting capabilities, we've bewitched an app that streams terror with the grace of a ghost gliding through the night.
 
 ## Features 🕯️
 
 AI-Powered Storytelling: Summon the spirits of AI to generate tales of terror on the fly.
+
+Bring your own AI/LLM with LocalAI (or other custom OpenAI compatible API) or use OpenAI
+
 Interactive UI: Choose your own adventure by selecting story elements that shape your frightening fable.
+
 Real-time Streaming: Experience the horror unfold in real-time as the story mutates before your terrified eyes.
+
+Text to Speech: Don't just read it the story, hear it told to you using TTS on OpenAI or LocalAI
+
 Halloween Humor: Because even in the darkest depths, a chuckle can be the most terrifying sound.
 
+![application](images/app.png)
+
 ## Getting Started 🧹
 
 Choose your path to horror story glory with one of these three enchanting options:
@@ -129,6 +146,84 @@ OR
 
 Set your key for OpenAI, or a custom address for your OpenAI compatiable API LLM endpoint.
 
+## Running the Application
+
+This is dependant on which API provider you are going to use:
+
+Set your specific settings (see below for basic guides)
+
+![settings](images/settings.png)
+
+Hit the `Generate Story` button
+
+![story](images/story.png)
+
+If you want to hear the story spoken to you, hit the `Speak it to Me` button
+
+![text-to-speech](images/text-to-speech.png)
+
+### OpenAI
+
+Set your OpenAI API Key at the top left
+
+Select your LLM model (gpt-4)
+
+Select your TTS model (tts-1)
+
+Select your voice (you can change this later to try multiple voices)
+
+Hit the `Generatate Story` button and watch it go.
+
+Once the story is done generating if you want you can hit the `Speak it to me button` to generate and play the Text to Speech.
+
+### Docker Compose
+
+Leave the OpenAI API Key at the top left blank
+
+Select the `http://localai:8080/v1` endpoint (internal docker networking)
+
+Select your LLM model (example: gpt-4)
+
+Select your TTS model (example: tts-1)
+
+Select your voice (you can change this later to try multiple voices)
+
+Hit the `Generatate Story` button and watch it go.
+
+Once the story is done generating if you want you can hit the `Speak it to me button` to generate and play the Text to Speech.
+
+### Using LocalAI/LMStudio/Ollama/etc locally
+
+Leave the OpenAI API Key at the top left blank
+
+Select the `http://localai:8080/v1` endpoint (internal docker networking)
+
+Select your LLM model (example: gpt-4)
+
+Select your TTS model (example: tts-1)
+
+Select your voice (you can change this later to try multiple voices)
+
+Hit the `Generatate Story` button and watch it go.
+
+Once the story is done generating if you want you can hit the `Speak it to me button` to generate and play the Text to Speech.
+
+### Using a custom endpoint URL
+
+Leave the OpenAI API Key at the top left blank
+
+Select the `http://localai:8080/v1` endpoint (internal docker networking)
+
+Select your LLM model (example: gpt-4)
+
+Select your TTS model (example: tts-1)
+
+Select your voice (you can change this later to try multiple voices)
+
+Hit the `Generatate Story` button and watch it go.
+
+Once the story is done generating if you want you can hit the `Speak it to me button` to generate and play the Text to Speech.
+
 ## Development Setup
 
 This repo uses things such as precommit, task, and brew (for Mac)
@@ -169,6 +264,15 @@ with just plain streamlit
 streamlit run RoboTF_Halloween_Stories.py
 ```
 
+## Text to Speech
+
+For OpenAI select the TTS-1 model
+
+For using LocalAI if you want extra voices, just copy the `voice_models_localai/tts-1.yaml` to the `models/` directory and startup (or restart LocalAI container).
+This uses Piper under the hood with LocalAI.
+
+Then you should be able to use the full selection of voices in the menu.
+
 ## Contact
 
 <[email protected]>

diff --git a/RoboTF_Halloween_Stories.py b/RoboTF_Halloween_Stories.py
@@ -1,10 +1,15 @@
+import emoji
+import io
+import re
+import requests
 import streamlit as st
 from langchain_openai import ChatOpenAI
 from requests.exceptions import RequestException
-import requests
+
+accumulated_story = ""
 
 # Function to generate story stream
-def generate_story_stream(api_key, endpoint, model, prompt):
+def generate_story_stream(api_key, endpoint, model, prompt, on_complete_callback=None):
     llm = ChatOpenAI(
         base_url=endpoint,
         openai_api_key=api_key,
@@ -13,9 +18,98 @@ def generate_story_stream(api_key, endpoint, model, prompt):
     )
 
     formatted_input = [{"role": "user", "content": prompt}]
-
+    if on_complete_callback:
+        on_complete_callback(accumulated_story)
     return llm.stream(formatted_input)
 
+# Function to handle the completion of the story generation
+def on_story_complete(story):
+    # This function will be called once the story streaming is complete
+    # You can now use the accumulated_story variable as needed
+    global accumulated_story
+    accumulated_story = story
+    print("Story complete:", accumulated_story)
+
+# Function to remove emojis and special characters for better TTS
+def remove_emojis(text):
+    # Remove emoji using the emoji library
+    text_without_emojis = emoji.replace_emoji(text, replace='')
+
+    # Remove asterisks using regex
+    text_without_asterisks = re.sub(r"\*", '', text_without_emojis)
+
+    # Remove quotes (single and double)
+    text_without_quotes = re.sub(r"[\"']", '', text_without_asterisks)
+
+    # Remove line breaks and extra spaces
+    text_without_linebreaks = text_without_quotes.replace("\n", " ").replace("\r", " ").strip()
+
+    # Remove special characters (except for alphanumeric and spaces)
+    clean_text = re.sub(r"[^a-zA-Z0-9\s]", '', text_without_linebreaks)
+
+    # Replace multiple spaces with a single space
+    final_cleaned_text = re.sub(r"\s+", ' ', clean_text)
+
+    return final_cleaned_text
+
+# Function to get the TTS wav file
+def text_to_speech(text, endpoint, api_key, tts_model, voice_selection):
+    """
+    Convert text to speech using the provided TTS endpoint and model.
+    """
+
+    headers = {
+        "Authorization": f"Bearer {api_key}",
+        "Content-Type": "application/json"
+    }
+
+    if 'api.openai.com' in endpoint:
+        # If it does, append '/audio/speech' to the endpoint
+        tts_endpoint = endpoint + '/audio/speech'
+        payload = {
+            "input": text,
+            "model": tts_model,
+            "voice": voice_selection,
+            "response_format": "wav"
+        }
+    else:
+        # If it does not, replace '/v1' with '/tts' for LocalAI
+        tts_endpoint = endpoint.replace("/v1", "/tts")
+        payload = {
+            "model": voice_selection+".onnx",
+            "backend": "piper",
+            "input": text
+        }
+
+    print(f"tts_endpoint: {tts_endpoint}")
+    print(f"tts_model: {tts_model}")
+    print(f"voice: {voice_selection}")
+    print(f"Payload: {payload}")
+
+    response = requests.post(tts_endpoint, headers=headers, json=payload)
+    print(f"request: {response}")
+    if response.status_code == 200:
+        audio_content = response.content
+        print(response)
+        print(response.status_code)
+
+        audio_content = response.content
+
+        with open('temp_audio.wav', 'wb') as f:
+            f.write(audio_content)
+
+        return io.BytesIO(audio_content)
+
+    else:
+        raise RequestException(f"TTS request failed with status code {response.status_code}")
+
+# Function to play the audio
+def play_audio(audio_bytes):
+    """
+    Play audio directly within the Streamlit app.
+    """
+    st.audio(audio_bytes, format='audio/wav', autoplay=True)
+
 # Function to query models from LLM URL
 def get_llm_models(llm_url, api_key):
     headers = {
@@ -24,8 +118,8 @@ def get_llm_models(llm_url, api_key):
 
     try:
         response = requests.get(f"{llm_url}/models", headers=headers)
-        response.raise_for_status()  # Raise an exception for HTTP errors   
-        if response.status_code == 200: 
+        response.raise_for_status()  # Raise an exception for HTTP errors
+        if response.status_code == 200:
             return [model['id'] for model in response.json().get('data', [])]
         else:
             st.error("Failed to fetch models. Status code: {response.status_code}")
@@ -39,14 +133,14 @@ def get_llm_models(llm_url, api_key):
         st.error(f"An unexpected error occurred: {e}")
         return []
 
-def main():  
+def main():
     # Streamlit app
     st.title("RoboTF Halloween Story Generator")
 
     st.image("images/robotf_halloween.jpg")
     st.sidebar.title("Settings")
-    api_key = st.sidebar.text_input("OpenAI API Key (Leave Blank for LocalAI)", type="password", value="1234")
-    default_endpoint = st.sidebar.selectbox("Default Endpoint", ["https://api.openai.com/v1", "http://localai:8080/v1"], index=1)
+    api_key = st.sidebar.text_input("OpenAI API Key (Leave Blank for LocalAI unless API Key set on Server)", type="password", value="1234")
+    default_endpoint = st.sidebar.selectbox("Default Endpoint", ["https://api.openai.com/v1", "http://localai:8080/v1", "http://localhost:8080/v1"], index=1)
 
     st.sidebar.write("Or Use Another API")
 
@@ -60,11 +154,42 @@ def main():
     # Sidebar to select the LLM model
     model = st.sidebar.selectbox("Select LLM Model", models)
 
+    tts_model = st.sidebar.selectbox("Select the TTS Model", models)
+
+        # Check if the endpoint contains 'api.openai.com'
+    if 'api.openai.com' in endpoint:
+        # If it does, append '/audio/speech' to the endpoint
+        voice_list = [
+            "alloy",
+            "echo",
+            "fable",
+            "onyx",
+            "nova",
+            "shimmer"
+            ]
+    else:
+        # If it does not, replace '/v1' with '/tts'
+        voice_list = [
+            "en-us-amy-low",
+            "en-gb-alan-low",
+            "en-gb-southern_english_female-low",
+            "en-us-danny-low",
+            "en-us-kathleen-low",
+            "en-us-lessac-low",
+            "en-us-lessac-medium",
+            "en-us-libritts-high",
+            "en-us-ryan-high",
+            "en-us-ryan-low",
+            "en-us-ryan-medium",
+        ]
+
+    voice_selection = st.sidebar.selectbox("Select the Voice", voice_list)
+
     # Show default prompt and allow changes
     st.write(':green[User Prompt]')
     user_prompt = """Create a spooky Halloween tale where cutting-edge AI and powerful
 hardware like GPUs and CPUs come to life. In this story, large language models (LLMs)
-play a central role, but something goes wrong during testing, inference, power 
+play a central role, but something goes wrong during testing, inference, power
 consumption or anything else that is AI related. Perhaps the models start
 predicting strange, eerie outcomes, or the hardware begins to malfunction in ways no
 one expected. The tale should blend technological horror with classic Halloween
@@ -93,8 +218,28 @@ def main():
         if api_key and endpoint and model:
             # Clear the story placeholder before generating a new story
             story_placeholder.empty()
+            # Update the global accumulated_story variable
+            global accumulated_story
             # Generate and stream the story into the placeholder
-            story_placeholder.write_stream(generate_story_stream(api_key, endpoint, model, prompt))
+            # Pass the on_story_complete function as a callback
+            accumulated_story = story_placeholder.write_stream(generate_story_stream(api_key, endpoint, model, prompt, on_complete_callback=on_story_complete))
+            # The on_story_complete function will be called with the full story content
+            print(accumulated_story)
+            st.session_state['accumulated_story'] = accumulated_story
+
+
+    if st.button("Speak It To Me"):
+        print("Speak it to Me Button Clicked")
+        # Retrieve the generated story
+        print(f"Full Story: {st.session_state['accumulated_story']}")
+        story_text = st.session_state['accumulated_story']
+        clean_text = remove_emojis(story_text)
+        print(f"Clean Text: {clean_text}")
+        st.text_area(':green[Generated Story:]', story_text, key="story_text", height=400)
+        # Convert the story to speech
+        audio_bytes = text_to_speech(clean_text, endpoint, api_key, tts_model, voice_selection)
+        # Play the audio
+        play_audio(audio_bytes)
 
 if __name__ == "__main__":
     main()
diff --git a/Taskfile.yml b/Taskfile.yml
@@ -1,7 +1,7 @@
 version: "3"
 
 vars:
-  IMAGE_VERSION: "v0.0.3"
+  IMAGE_VERSION: "v0.0.4"
   IMAGE_NAME: "robotf/robotf-halloween-stories"
 
 tasks:

diff --git a/images/app.png b/images/app.png
diff --git a/images/settings.png b/images/settings.png
diff --git a/images/story.png b/images/story.png
diff --git a/images/text-to-speech.png b/images/text-to-speech.png
diff --git a/package.json b/package.json
@@ -1,4 +1,4 @@
 {
     "name": "robotf-halloween-stories",
-    "version": "v0.0.3"
+    "version": "v0.0.4"
 }