A Retrieval-Augmented Generation (RAG) agent for OpenBB Workspace that indexes financial documents (SEC filings, earnings transcripts, research reports) and combines retrieved context with live OpenBB widget data to answer complex financial research questions.
- π Index and search SEC filings (10-K, 10-Q, 8-K) via SEC EDGAR
- ποΈ Index earnings call transcripts
- π Index PDF research reports
- π Semantic search with vector embeddings via ChromaDB
- π Combine document context with live OpenBB widget data
- π Automatic source citations with relevance scores
- β‘ Streaming responses with reasoning steps
- π€ Multi-LLM support (OpenAI, Ollama, Azure, and more)
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β User Query βββββΆβ Embed Query βββββΆβ Vector Search β
β β β (OpenAI/Ollama) β β (ChromaDB) β
βββββββββββββββββββ ββββββββββββββββββββ ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Stream Responseββββββ Generate Answer ββββββ Retrieved Docs β
β + Citations β β (LLM) β β + Widget Data β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
- Query Embedding: User question is converted to a vector using the embedding model
- Semantic Search: ChromaDB finds the most relevant document chunks
- Context Assembly: Retrieved documents + live widget data are combined
- LLM Generation: The LLM generates an answer grounded in the retrieved context
- Citation: Sources are automatically cited with relevance scores
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenBB Workspace β
β βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ β
β β User Query βββββΆβ RAG Agent βββββΆβ Widget Data β β
β βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAG Agent Server β
β βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ β
β β FastAPI β β Vector Store β β LLM β β
β β Endpoints βββββΆβ (ChromaDB) βββββΆβ (OpenAI/ β β
β β β β β β Ollama/etc) β β
β βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ β
β β β² β
β β βββββββββ΄ββββββββ β
β β β Embeddings β β
β β β (OpenAI/ β β
β β β Ollama) β β
β β βββββββββββββββββ β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Document Ingestion β β
β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β
β β β SEC β βEarnings β βResearch β β Custom β β β
β β β Filings β βTranscr. β β Reports β β Docs β β β
β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The agent uses the OpenAI SDK with configurable base_url, supporting any OpenAI-compatible API:
| Provider | LLM | Embeddings | Configuration |
|---|---|---|---|
| OpenAI | β GPT-4o, GPT-4 | β text-embedding-3-* | Default |
| Ollama | β Llama, Qwen, Mistral | β nomic-embed-text | OPENAI_BASE_URL=http://localhost:11434/v1 |
| Azure OpenAI | β | β | Custom base_url + API key |
| Together AI | β Llama, Mixtral | β | base_url=https://api.together.xyz/v1 |
| Groq | β Llama 3 (fast) | β | base_url=https://api.groq.com/openai/v1 |
| Fireworks AI | β | β | base_url=https://api.fireworks.ai/inference/v1 |
| vLLM | β Any HF model | β | Self-hosted |
| LM Studio | β | β | base_url=http://localhost:1234/v1 |
| LocalAI | β | β | Drop-in replacement |
| OpenRouter | β 100+ models | β | base_url=https://openrouter.ai/api/v1 |
# 1. Install Ollama and pull models
ollama pull llama3.2
ollama pull nomic-embed-text
# 2. Configure .env
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama
LLM_MODEL=llama3.2
EMBEDDING_MODEL=nomic-embed-text
# 3. Run the agent
poetry run uvicorn rag_financial_research_agent.main:app --port 7777- Python 3.10+
- Poetry
- OpenAI API key (or Ollama for local LLMs)
cd examples/rag-financial-research-agent
poetry installcp .env.example .env
# Edit .env with your API key and model preferencesOpenAI Configuration:
OPENAI_API_KEY=sk-...
LLM_MODEL=gpt-4o
EMBEDDING_MODEL=text-embedding-3-smallOllama Configuration:
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama
LLM_MODEL=llama3.2
EMBEDDING_MODEL=nomic-embed-text# Ingest SEC filings for AAPL, MSFT, GOOGL
poetry run python scripts/ingest_sample_docs.pypoetry run uvicorn rag_financial_research_agent.main:app --port 7777 --reload- Open OpenBB Workspace
- Add custom agent with URL:
http://localhost:7777/agents.json - Start asking questions!
- "What are the key risk factors mentioned in Apple's latest 10-K?"
- "Compare revenue growth between MSFT and GOOGL based on their filings"
- "What did management say about AI in the last earnings call?"
- "Summarize Apple's R&D spending and focus areas"
| Endpoint | Method | Description |
|---|---|---|
/agents.json |
GET | Agent descriptor for OpenBB Workspace |
/v1/query |
POST | Main query endpoint (SSE streaming) |
/health |
GET | Health check |
/stats |
GET | Vector store statistics |
curl -N http://localhost:7777/v1/query \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"human","content":"What are Apple'\''s key risks?"}]}'Response (SSE stream):
event: copilotStatusUpdate
data: {"eventType":"INFO","message":"Searching financial documents..."}
event: copilotStatusUpdate
data: {"eventType":"INFO","message":"Found 5 relevant documents"}
event: copilotMessageChunk
data: {"delta":"Based on Apple's 10-K filing..."}
event: copilotCitationCollection
data: {"citations":[{"source_info":{"name":"AAPL_10-K_2023"}}]}
poetry run pytest -vpoetry run ruff check .
poetry run mypy rag_financial_research_agentrag-financial-research-agent/
βββ rag_financial_research_agent/
β βββ main.py # FastAPI application
β βββ config.py # Configuration settings
β βββ embeddings.py # Embedding generation (OpenAI/Ollama)
β βββ vector_store.py # ChromaDB operations
β βββ retriever.py # RAG retrieval logic
β βββ ingestion/
β β βββ base.py # Base ingestion interface
β β βββ sec_filings.py # SEC EDGAR ingestion
β β βββ earnings_transcripts.py # Earnings call ingestion
β β βββ pdf_documents.py # Generic PDF ingestion
β βββ utils/
β βββ text_splitter.py # Document chunking
β βββ prompts.py # System prompts
βββ tests/ # Test suite (19 tests)
βββ scripts/
β βββ ingest_sample_docs.py # Sample ingestion script
β βββ health_check.py # Health check script
βββ data/ # Document storage
βββ pyproject.toml # Dependencies
βββ .env.example # Environment template
- SEC Filings: Downloads 10-K, 10-Q, 8-K from SEC EDGAR, chunks into ~1000 token segments
- Earnings Transcripts: Parses quarterly earnings call transcripts
- PDF Documents: Extracts text from research reports via pdfplumber
- Persistent local storage in
./chroma_db - Cosine similarity search
- Metadata filtering by ticker, document type, date
- Top-K semantic search (default: 5 documents)
- Metadata-based filtering (ticker, document type)
- Context formatting with source attribution
- Streaming responses via SSE
- Reasoning steps exposed to UI
- Automatic citation generation
MIT