A ChatBot for AWS Blogs 🤖 📋

A chatbot to talk to AWS Blogs using local LLMs and semantic search.

Features

🌐 Multi-Source Blog Indexing:
- Scrapes AWS blog posts from multiple RSS feeds
- Supports Machine Learning, Security, Big Data, Containers, Databases, Serverless, and Cloud Operations blogs
🤖 Flexible LLM Backend:
- Uses Ollama for local AI model inference
- Supports dynamic selection of embedding and text generation models
- Currently supports Ollama as the primary model backend
💾 Efficient Data Storage:
- Vector Store: LanceDB for semantic search and embeddings
- Metadata Storage: DuckDB for tracking blog post metadata
- Semantic chunking of blog posts for improved retrieval
🔍 Advanced Search Capabilities:
- Semantic search across blog posts
- Multiple search types: vector, full-text, hybrid, and re-ranking

Architecture

Components

BlogBuddy.py:
- Main Streamlit application
- Handles user interface and model configuration
- Manages RSS feed refresh and vector store population
lancedb_utils.py:
- Manages LanceDB vector store
- Handles semantic chunking of documents
- Provides advanced search capabilities
ollama_utils.py:
- Manages Ollama model interactions
- Retrieves embeddings and generates text responses
- Dynamically lists available Ollama models
blog_utils.py:
- Scrapes AWS blog RSS feeds
- Processes and cleans blog post content
- Manages DuckDB database for blog post tracking

Usage

Prerequisites

Python 3.10+
Ollama
UV package manager

Installation

Install uv

pip install uv

Install ollama

Refer to Ollama Installation quickstart

git clone https://github.com/praveenc/ollama-blog-buddy

cd ollama-blog-buddy
uv venv
uv sync

Launch Streamlit App

uv run streamlit run app/BlogBuddy.py

Once the app is launched:

Select an Embedding Model
Choose a Text Generation Model (LLM)
Click Save
Click Save and Refresh button to download RSS feeds and populate the vector store
Navigate to Chat page
Start chatting with your AWS blog posts!

VectorDB and Storage

We use LanceDB as our vector store. Vectors and metadata are stored locally.

To avoid scraping RSS feeds multiple times, we cache scraped html data to disk and log the scraping activity to DuckDB locally.

AWS Blogs RSS Feeds

Blog posts are indexed from the below AWS RSS feeds.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

[Add your license information here]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A ChatBot for AWS Blogs 🤖 📋

Features

Architecture

Components

Usage

Prerequisites

Installation

Install uv

Install ollama

Launch Streamlit App

VectorDB and Storage

AWS Blogs RSS Feeds

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

praveenc/ollama-blog-buddy

Folders and files

Latest commit

History

Repository files navigation

A ChatBot for AWS Blogs 🤖 📋

Features

Architecture

Components

Usage

Prerequisites

Installation

Install uv

Install ollama

Launch Streamlit App

VectorDB and Storage

AWS Blogs RSS Feeds

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages