A chatbot to talk to AWS Blogs using local LLMs and semantic search.
-
🌐 Multi-Source Blog Indexing:
- Scrapes AWS blog posts from multiple RSS feeds
- Supports Machine Learning, Security, Big Data, Containers, Databases, Serverless, and Cloud Operations blogs
-
🤖 Flexible LLM Backend:
- Uses Ollama for local AI model inference
- Supports dynamic selection of embedding and text generation models
- Currently supports Ollama as the primary model backend
-
💾 Efficient Data Storage:
-
🔍 Advanced Search Capabilities:
- Semantic search across blog posts
- Multiple search types: vector, full-text, hybrid, and re-ranking
-
BlogBuddy.py
:- Main Streamlit application
- Handles user interface and model configuration
- Manages RSS feed refresh and vector store population
-
lancedb_utils.py
:- Manages LanceDB vector store
- Handles semantic chunking of documents
- Provides advanced search capabilities
-
ollama_utils.py
:- Manages Ollama model interactions
- Retrieves embeddings and generates text responses
- Dynamically lists available Ollama models
-
blog_utils.py
:- Scrapes AWS blog RSS feeds
- Processes and cleans blog post content
- Manages DuckDB database for blog post tracking
- Python 3.10+
- Ollama
- UV package manager
pip install uv
Refer to Ollama Installation quickstart
git clone https://github.com/praveenc/ollama-blog-buddy
cd ollama-blog-buddy
uv venv
uv sync
uv run streamlit run app/BlogBuddy.py
Once the app is launched:
- Select an Embedding Model
- Choose a Text Generation Model (LLM)
- Click Save
- Click
Save and Refresh
button to download RSS feeds and populate the vector store - Navigate to
Chat
page - Start chatting with your AWS blog posts!
We use LanceDB as our vector store. Vectors and metadata are stored locally.
To avoid scraping RSS feeds multiple times, we cache scraped html data to disk and log the scraping activity to DuckDB locally.
Blog posts are indexed from the below AWS RSS feeds.
- AWS Machine Learning blogs
- AWS Security blogs
- AWS Analytics/Big-Data blogs
- AWS Containers blogs
- AWS Database blogs
- AWS Serverless blogs
- AWS CloudOperations and Migrations blogs
Contributions are welcome! Please feel free to submit a Pull Request.
[Add your license information here]