Skip to content

sandole/openbb-rag-financial-research-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAG Financial Research Agent

A Retrieval-Augmented Generation (RAG) agent for OpenBB Workspace that indexes financial documents (SEC filings, earnings transcripts, research reports) and combines retrieved context with live OpenBB widget data to answer complex financial research questions.

Features

  • πŸ“š Index and search SEC filings (10-K, 10-Q, 8-K) via SEC EDGAR
  • πŸŽ™οΈ Index earnings call transcripts
  • πŸ“„ Index PDF research reports
  • πŸ” Semantic search with vector embeddings via ChromaDB
  • πŸ“Š Combine document context with live OpenBB widget data
  • πŸ“ Automatic source citations with relevance scores
  • ⚑ Streaming responses with reasoning steps
  • πŸ€– Multi-LLM support (OpenAI, Ollama, Azure, and more)

How It Works

RAG Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Query    │───▢│  Embed Query     │───▢│  Vector Search  β”‚
β”‚                 β”‚    β”‚  (OpenAI/Ollama) β”‚    β”‚  (ChromaDB)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                        β”‚
                                                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stream Response│◀───│  Generate Answer │◀───│ Retrieved Docs  β”‚
β”‚  + Citations    β”‚    β”‚  (LLM)           β”‚    β”‚ + Widget Data   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Query Embedding: User question is converted to a vector using the embedding model
  2. Semantic Search: ChromaDB finds the most relevant document chunks
  3. Context Assembly: Retrieved documents + live widget data are combined
  4. LLM Generation: The LLM generates an answer grounded in the retrieved context
  5. Citation: Sources are automatically cited with relevance scores

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      OpenBB Workspace                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  User Query   │───▢│  RAG Agent    │◀──▢│ Widget Data   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      RAG Agent Server                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ FastAPI       β”‚    β”‚ Vector Store  β”‚    β”‚ LLM           β”‚   β”‚
β”‚  β”‚ Endpoints     │───▢│ (ChromaDB)    │───▢│ (OpenAI/      β”‚   β”‚
β”‚  β”‚               β”‚    β”‚               β”‚    β”‚  Ollama/etc)  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚          β”‚                    β–²                                  β”‚
β”‚          β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”                         β”‚
β”‚          β”‚            β”‚  Embeddings   β”‚                         β”‚
β”‚          β”‚            β”‚  (OpenAI/     β”‚                         β”‚
β”‚          β”‚            β”‚   Ollama)     β”‚                         β”‚
β”‚          β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β”‚
β”‚          β–Ό                                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                    Document Ingestion                      β”‚  β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚  β”‚
β”‚  β”‚  β”‚ SEC     β”‚  β”‚Earnings β”‚  β”‚Research β”‚  β”‚ Custom  β”‚      β”‚  β”‚
β”‚  β”‚  β”‚ Filings β”‚  β”‚Transcr. β”‚  β”‚ Reports β”‚  β”‚ Docs    β”‚      β”‚  β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

LLM Provider Support

The agent uses the OpenAI SDK with configurable base_url, supporting any OpenAI-compatible API:

Provider LLM Embeddings Configuration
OpenAI βœ… GPT-4o, GPT-4 βœ… text-embedding-3-* Default
Ollama βœ… Llama, Qwen, Mistral βœ… nomic-embed-text OPENAI_BASE_URL=http://localhost:11434/v1
Azure OpenAI βœ… βœ… Custom base_url + API key
Together AI βœ… Llama, Mixtral βœ… base_url=https://api.together.xyz/v1
Groq βœ… Llama 3 (fast) ❌ base_url=https://api.groq.com/openai/v1
Fireworks AI βœ… βœ… base_url=https://api.fireworks.ai/inference/v1
vLLM βœ… Any HF model βœ… Self-hosted
LM Studio βœ… βœ… base_url=http://localhost:1234/v1
LocalAI βœ… βœ… Drop-in replacement
OpenRouter βœ… 100+ models ❌ base_url=https://openrouter.ai/api/v1

Using with Ollama (Local LLMs)

# 1. Install Ollama and pull models
ollama pull llama3.2
ollama pull nomic-embed-text

# 2. Configure .env
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama
LLM_MODEL=llama3.2
EMBEDDING_MODEL=nomic-embed-text

# 3. Run the agent
poetry run uvicorn rag_financial_research_agent.main:app --port 7777

Quick Start

Prerequisites

  • Python 3.10+
  • Poetry
  • OpenAI API key (or Ollama for local LLMs)

Installation

cd examples/rag-financial-research-agent
poetry install

Configuration

cp .env.example .env
# Edit .env with your API key and model preferences

OpenAI Configuration:

OPENAI_API_KEY=sk-...
LLM_MODEL=gpt-4o
EMBEDDING_MODEL=text-embedding-3-small

Ollama Configuration:

OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama
LLM_MODEL=llama3.2
EMBEDDING_MODEL=nomic-embed-text

Ingest Sample Documents

# Ingest SEC filings for AAPL, MSFT, GOOGL
poetry run python scripts/ingest_sample_docs.py

Run the Agent

poetry run uvicorn rag_financial_research_agent.main:app --port 7777 --reload

Connect to OpenBB Workspace

  1. Open OpenBB Workspace
  2. Add custom agent with URL: http://localhost:7777/agents.json
  3. Start asking questions!

Example Queries

  • "What are the key risk factors mentioned in Apple's latest 10-K?"
  • "Compare revenue growth between MSFT and GOOGL based on their filings"
  • "What did management say about AI in the last earnings call?"
  • "Summarize Apple's R&D spending and focus areas"

API Endpoints

Endpoint Method Description
/agents.json GET Agent descriptor for OpenBB Workspace
/v1/query POST Main query endpoint (SSE streaming)
/health GET Health check
/stats GET Vector store statistics

Query Example

curl -N http://localhost:7777/v1/query \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"human","content":"What are Apple'\''s key risks?"}]}'

Response (SSE stream):

event: copilotStatusUpdate
data: {"eventType":"INFO","message":"Searching financial documents..."}

event: copilotStatusUpdate
data: {"eventType":"INFO","message":"Found 5 relevant documents"}

event: copilotMessageChunk
data: {"delta":"Based on Apple's 10-K filing..."}

event: copilotCitationCollection
data: {"citations":[{"source_info":{"name":"AAPL_10-K_2023"}}]}

Testing

poetry run pytest -v

Linting

poetry run ruff check .
poetry run mypy rag_financial_research_agent

Project Structure

rag-financial-research-agent/
β”œβ”€β”€ rag_financial_research_agent/
β”‚   β”œβ”€β”€ main.py                        # FastAPI application
β”‚   β”œβ”€β”€ config.py                      # Configuration settings
β”‚   β”œβ”€β”€ embeddings.py                  # Embedding generation (OpenAI/Ollama)
β”‚   β”œβ”€β”€ vector_store.py                # ChromaDB operations
β”‚   β”œβ”€β”€ retriever.py                   # RAG retrieval logic
β”‚   β”œβ”€β”€ ingestion/
β”‚   β”‚   β”œβ”€β”€ base.py                    # Base ingestion interface
β”‚   β”‚   β”œβ”€β”€ sec_filings.py             # SEC EDGAR ingestion
β”‚   β”‚   β”œβ”€β”€ earnings_transcripts.py    # Earnings call ingestion
β”‚   β”‚   └── pdf_documents.py           # Generic PDF ingestion
β”‚   └── utils/
β”‚       β”œβ”€β”€ text_splitter.py           # Document chunking
β”‚       └── prompts.py                 # System prompts
β”œβ”€β”€ tests/                             # Test suite (19 tests)
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ ingest_sample_docs.py          # Sample ingestion script
β”‚   └── health_check.py                # Health check script
β”œβ”€β”€ data/                              # Document storage
β”œβ”€β”€ pyproject.toml                     # Dependencies
└── .env.example                       # Environment template

Key Components

Document Ingestion

  • SEC Filings: Downloads 10-K, 10-Q, 8-K from SEC EDGAR, chunks into ~1000 token segments
  • Earnings Transcripts: Parses quarterly earnings call transcripts
  • PDF Documents: Extracts text from research reports via pdfplumber

Vector Store (ChromaDB)

  • Persistent local storage in ./chroma_db
  • Cosine similarity search
  • Metadata filtering by ticker, document type, date

Retrieval

  • Top-K semantic search (default: 5 documents)
  • Metadata-based filtering (ticker, document type)
  • Context formatting with source attribution

LLM Generation

  • Streaming responses via SSE
  • Reasoning steps exposed to UI
  • Automatic citation generation

License

MIT

About

RAG financial research agent for OpenBB - indexes SEC filings, earnings transcripts, and research reports with live OpenBB widget data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages