Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
4f497d0
feat: Complete Phase 1 planning for code ingestion
RichardHightower Dec 19, 2025
0d54b8c
feat: Expand code ingestion to support 9 programming languages
RichardHightower Dec 19, 2025
bb35ab9
feat: Add Kotlin support to code ingestion feature
RichardHightower Dec 19, 2025
b85db57
feat: Generate implementation tasks for code ingestion feature
RichardHightower Dec 19, 2025
6162882
added docs for how to add more langauges support
RichardHightower Dec 19, 2025
51734ba
feat: Complete Phase 1 setup for code ingestion
RichardHightower Dec 19, 2025
41903ce
feat: US2 Cross-Reference Search - unified search across docs and code
RichardHightower Dec 19, 2025
f051b82
feat: US6 SDK Corpus for Book/Tutorial Generation
RichardHightower Dec 19, 2025
1adbc34
feat: US3 Language-Specific Filtering
RichardHightower Dec 19, 2025
8849a33
fix: Resolve all linting issues in codebase
RichardHightower Dec 19, 2025
5d6ef6c
Merge pull request #75 from SpillwaveSolutions/feat/us4-llm-code-summ…
RichardHightower Dec 19, 2025
b81f7ad
feat: US4 Code Summaries via LLM
RichardHightower Dec 19, 2025
2dacf38
fix: Revert to Claude 3.5 Haiku for summarization (not OpenAI)
RichardHightower Dec 19, 2025
061677e
docs: Update configuration comments for Claude 3.5 Haiku
RichardHightower Dec 19, 2025
bdcea1f
Merge pull request #76 from SpillwaveSolutions/feat/us4-llm-code-summ…
RichardHightower Dec 19, 2025
c6ff834
fix: QA gate issues for code ingestion MVP
RichardHightower Dec 19, 2025
e233537
Merge branch '101-code-ingestion' of github.com:SpillwaveSolutions/do…
RichardHightower Dec 19, 2025
c3a04e1
fix: Update Task version in CI workflow
RichardHightower Dec 19, 2025
3c207e2
fix: Resolve remaining line length linting errors
RichardHightower Dec 19, 2025
673fcc0
docs: Update documentation for code ingestion MVP
RichardHightower Dec 19, 2025
baf7d7e
docs: Emphasize mandatory QA gate requirement in AGENTS.md
RichardHightower Dec 19, 2025
a07e9c5
fix: Progress tracking during multi-language code chunking
RichardHightower Dec 19, 2025
19c06bb
fix: Progress tracking during multi-language code chunking
RichardHightower Dec 19, 2025
71d4ad6
fix: Resolve linting and type checking issues
RichardHightower Dec 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ EMBEDDING_MODEL=text-embedding-3-large
# Get your API key from: https://console.anthropic.com/settings/keys
ANTHROPIC_API_KEY=sk-ant-your-anthropic-api-key-here

# Claude model for summarization (default: claude-3-5-haiku-20241022)
CLAUDE_MODEL=claude-3-5-haiku-20241022
# Claude model for summarization (default: claude-haiku-4-5-20251001)
CLAUDE_MODEL=claude-haiku-4-5-20251001

# =============================================================================
# Chroma Vector Store Configuration
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr-qa-gate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install Task
uses: arduino/setup-task@v2
with:
version: 3.x
version: 3.43.3

- name: Install Poetry
uses: snok/install-poetry@v1
Expand Down
4 changes: 3 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,9 @@ task before-push

This runs format, lint, typecheck, and tests with coverage.

**MANDATORY**: You MUST run `task pr-qa-gate` before checking in or pushing any changes. You should also run `task pr-qa-gate` whenever checking project status or SDD status.
**IMPORTANT**: You MUST run `task pr-qa-gate` before checking in or pushing any changes. You should also run `task pr-qa-gate` whenever checking project status or SDD status.

**MANDATORY**: Any feature or task is not considered done unless `task pr-qa-gate` passes successfully.

## Git Workflow

Expand Down
4 changes: 4 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,10 @@ Do NOT push code that fails `task before-push`.
## Active Technologies
- Python 3.10+ + FastAPI, LlamaIndex, ChromaDB, OpenAI, rank-bm25 (100-bm25-hybrid-retrieval)
- ChromaDB (Vector Store), Local Persistent BM25 Index (LlamaIndex) (100-bm25-hybrid-retrieval)
- Python 3.10+ + LlamaIndex (CodeSplitter, SummaryExtractor), tree-sitter parsers, ChromaDB (101-code-ingestion)
- ChromaDB (unified vector store), Disk-based BM25 index (101-code-ingestion)
- Python 3.10+ + LlamaIndex (CodeSplitter, SummaryExtractor), tree-sitter (AST parsing), OpenAI/Anthropic (embeddings/summaries) (101-code-ingestion)
- ChromaDB vector store (existing) (101-code-ingestion)

## Recent Changes
- 100-bm25-hybrid-retrieval: Added Python 3.10+ + FastAPI, LlamaIndex, ChromaDB, OpenAI, rank-bm25
31 changes: 30 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,41 @@ Doc-Serve is a monorepo containing three packages:
| **doc-svr-ctl** | Command-line interface for managing the server |
| **doc-serve-skill** | Claude Code skill for AI-powered documentation queries |

## Code Ingestion & Search

Doc-Serve now supports unified search across documentation and source code:

- **10 Programming Languages**: Python, TypeScript, JavaScript, Java, Kotlin, C, C++, Go, Rust, Swift
- **AST-Aware Chunking**: Intelligent code parsing and chunking using tree-sitter
- **Cross-Reference Queries**: Search across docs and code simultaneously
- **Language Filtering**: Filter results by programming language
- **Source Type Filtering**: Separate results by documentation vs. source code
- **LLM Code Summaries**: AI-generated summaries improve semantic search quality

### Example: Code-Aware Search
```bash
# Index both docs and code
doc-svr-ctl index ./my-project --include-code

# Search across everything
doc-svr-ctl query "authentication implementation"

# Filter by code only
doc-svr-ctl query "API endpoints" --source-types code --languages python
```

## Features

- **Code Ingestion**: Index and search across documentation AND source code
- **Cross-Reference Search**: Unified queries across docs and code with intelligent filtering
- **Language-Aware Processing**: AST-based chunking for 10+ programming languages
- **Hybrid Search**: Combines semantic meaning (Vector) with exact keyword matching (BM25)
- **Semantic Search**: Natural language queries using OpenAI embeddings
- **Keyword Search**: Precise term matching for technical documentation
- **Advanced Filtering**: Filter by source type (doc/code) and programming language
- **Vector Store**: ChromaDB for efficient similarity search
- **Context-Aware Chunking**: Intelligent document splitting with overlap
- **Context-Aware Chunking**: Intelligent document and code splitting with overlap
- **LLM Summaries**: AI-generated summaries for code chunks improve semantic search
- **REST API**: Full OpenAPI-documented REST interface
- **CLI Tool**: Comprehensive command-line management
- **Claude Integration**: Native Claude Code skill for AI workflows
Expand Down Expand Up @@ -155,6 +183,7 @@ doc-serve/
- **Embeddings**: OpenAI text-embedding-3-large
- **Summarization**: Claude Haiku
- **Indexing**: LlamaIndex
- **Code Parsing**: Tree-sitter (AST-aware chunking)
- **CLI**: Click + Rich
- **Build System**: Poetry

Expand Down
Loading