Skip to content

A chatbot to talk to AWS Blogs using local LLMs and semantic search.

License

Notifications You must be signed in to change notification settings

praveenc/ollama-blog-buddy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A ChatBot for AWS Blogs 🤖 📋

A chatbot to talk to AWS Blogs using local LLMs and semantic search.

Features

  • 🌐 Multi-Source Blog Indexing:

    • Scrapes AWS blog posts from multiple RSS feeds
    • Supports Machine Learning, Security, Big Data, Containers, Databases, Serverless, and Cloud Operations blogs
  • 🤖 Flexible LLM Backend:

    • Uses Ollama for local AI model inference
    • Supports dynamic selection of embedding and text generation models
    • Currently supports Ollama as the primary model backend
  • 💾 Efficient Data Storage:

    • Vector Store: LanceDB for semantic search and embeddings
    • Metadata Storage: DuckDB for tracking blog post metadata
    • Semantic chunking of blog posts for improved retrieval
  • 🔍 Advanced Search Capabilities:

    • Semantic search across blog posts
    • Multiple search types: vector, full-text, hybrid, and re-ranking

Architecture

Components

  1. BlogBuddy.py:

    • Main Streamlit application
    • Handles user interface and model configuration
    • Manages RSS feed refresh and vector store population
  2. lancedb_utils.py:

    • Manages LanceDB vector store
    • Handles semantic chunking of documents
    • Provides advanced search capabilities
  3. ollama_utils.py:

    • Manages Ollama model interactions
    • Retrieves embeddings and generates text responses
    • Dynamically lists available Ollama models
  4. blog_utils.py:

    • Scrapes AWS blog RSS feeds
    • Processes and cleans blog post content
    • Manages DuckDB database for blog post tracking

Usage

Prerequisites

Installation

Install uv

pip install uv

Install ollama

Refer to Ollama Installation quickstart

git clone https://github.com/praveenc/ollama-blog-buddy
cd ollama-blog-buddy
uv venv
uv sync

Launch Streamlit App

uv run streamlit run app/BlogBuddy.py

Once the app is launched:

  1. Select an Embedding Model
  2. Choose a Text Generation Model (LLM)
  3. Click Save
  4. Click Save and Refresh button to download RSS feeds and populate the vector store
  5. Navigate to Chat page
  6. Start chatting with your AWS blog posts!

VectorDB and Storage

We use LanceDB as our vector store. Vectors and metadata are stored locally.

To avoid scraping RSS feeds multiple times, we cache scraped html data to disk and log the scraping activity to DuckDB locally.

AWS Blogs RSS Feeds

Blog posts are indexed from the below AWS RSS feeds.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

[Add your license information here]

About

A chatbot to talk to AWS Blogs using local LLMs and semantic search.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages