Context Sieve v8.5

Ultra-low latency, highly accurate RAG context compression pipeline.

Overview

Context Sieve v8.5 uses a 2-stage hierarchical pipeline to compress retrieved context tokens by 60-75% before sending them to the Main LLM.

Stage 1: Parent-Child retrieval with 480-token parent chunks.
Stage 2: NanoPruner (MiniLM-L6) token classification with adaptive POS-based dilation and entropy-based confidence fallback.

Architecture

flowchart TB
    subgraph OFFLINE["Offline Training Pipeline"]
        direction TB
        C["Corpus\n(query + document pairs)"]
        C --> DG["Data Generator\n(Proxy LLM extracts key spans)"]
        DG --> AL["Label Aligner\n(RapidFuzz char-level matching)"]
        AL --> TD["Training Data\n(token-level 0/1 labels)"]
        TD --> TR["Trainer\n(Fine-tune MiniLM-L6)"]
        TR --> OX["ONNX Export + INT8 Quantize"]
        OX --> CAL["Threshold Calibrator\n(Binary search on val set)"]
        CAL --> MODEL["model_int8.onnx\n+ calibration.json"]
    end

    subgraph RL["RL Fine-tuning (Contextual Bandit)"]
        direction TB
        NIAH["NIAH Data Generator\n(needle-in-haystack)"] --> ENV["Environment\n(T=1 Bandit)"]
        CHK_PT["Cold-Start Model"] --> PPO["PPO Trainer\n(Bernoulli Action)"]
        ENV <-->|"keep_mask / reward"| PPO
        PPO --> JUDGE["LLM Judge proxy\n(+ R_linkage, R_size)"]
        JUDGE --> PPO
        PPO --> BEST_RL["best_rl_model.pt"]
    end

    subgraph INDEXER["Offline Indexer"]
        direction TB
        DOC["Raw Documents"] --> CHK["480-Token\nParent Chunks"]
        CHK --> POS["spaCy POS Tagger"]
        POS --> DIL["Boolean Dilation Array\n(Noun/Number/Negation)"]
    end

    subgraph HOTPATH["Real-Time Inference Hotpath"]
        direction TB
        Q["User Query"] --> FWD
        CTX["Retrieved Chunks"] --> FWD
        DIL -.->|"zero-cost lookup"| DILATE
        MODEL -.->|"load once"| FWD
        FWD["NanoPruner Forward Pass\n(ONNX INT8)"]
        FWD --> ENT{"Entropy\nCheck"}
        ENT -->|"High uncertainty"| BYPASS["Bypass → Return Raw"]
        ENT -->|"Confident"| THRESH["Apply Calibrated\nThreshold"]
        THRESH --> DILATE["Adaptive Dilation\n(Protect nouns/numbers)"]
        DILATE --> CAP["Sentence-Aware\nHard Cap"]
        CAP --> RECON["Offset-Based\nReconstruction"]
        RECON --> OUT["Compressed Text\n(60-75% smaller)"]
    end

    style OFFLINE fill:#1a1a2e,stroke:#e94560,color:#eee
    style RL fill:#301b3f,stroke:#fca311,color:#eee
    style INDEXER fill:#16213e,stroke:#0f3460,color:#eee
    style HOTPATH fill:#0f3460,stroke:#53d769,color:#eee
    style MODEL fill:#e94560,stroke:#e94560,color:#fff
    style OUT fill:#53d769,stroke:#53d769,color:#000

Quick Start

1. Setup Environment

python -m venv .venv
.\.venv\Scripts\activate
pip install pysbd rapidfuzz sentence-transformers transformers spacy onnxruntime onnx onnxscript numpy torch python-dotenv
python -m spacy download en_core_web_sm

2. Configure Proxy API

Edit .env with your proxy credentials:

API_URL=https://your-proxy/v1/chat/completions
API_KEY=your-key-here

3. Expand Training Corpus (Parallel)

# Generate 250 synthetic examples using 10 parallel threads
python tools/expand_corpus.py --count 250 --workers 10

Generates synthetic (query, document) pairs using your proxy LLM and appends to data/corpus.json.

4. Generate Training Labels (Parallel)

# Generate labels for the corpus using 10 parallel threads
# Use 10-15 workers to make it fast
python -m context_sieve.trainer.data_generator --corpus data/corpus.json --output data/training_data.json --workers 10

Supports resuming: if the process stops, simply restart to pick up from where you left off.

5. Train + Export + Calibrate (Phase 1: Cold Start)

# Full pipeline (Train + ONNX + INT8 + Calibrate)
python -m context_sieve.trainer.train --data data/training_data.json --full-pipeline --epochs 15

# Resume after crash (Skip training, just Export/Calibrate)
python -m context_sieve.trainer.train --data data/training_data.json --full-pipeline --skip-training

6. RL Fine-tuning (Phase 2: Coreference-Aware PPO)

Once the cold-start model is trained (best_checkpoint.pt), start the RL pipeline to optimize for complex Multi-Turn coreferences using the Contextual Bandit PPO loop:

python -m context_sieve.rl.train --cold_start models/nanopruner/best_checkpoint.pt

7. Run Benchmark

python benchmark_runner.py

Auto-detects the trained ONNX model. Results logged to result.log.

Core Modules

Module	Purpose
`context_sieve/indexer.py`	Offline indexing with POS metadata generation
`context_sieve/inference.py`	Real-time compression engine
`context_sieve/trainer/aligner.py`	GPT-4o extractive label alignment (RapidFuzz)
`context_sieve/trainer/calibrate.py`	INT8 threshold calibration
`context_sieve/trainer/data_generator.py`	LLM-powered training data generation
`context_sieve/trainer/train.py`	Fine-tuning + ONNX export pipeline
`context_sieve/rl/`	PPO Contextual Bandit, LLM Judge, and NIAH Env
`tools/expand_corpus.py`	Synthetic corpus expansion

Architecture Highlights

Zero-Latency Dilation: POS tagging (Noun/Number/Negation) is performed offline during indexing. The inference engine uses a pure boolean array lookup.
Entropy Fallback: If the model is uncertain, pruning is bypassed to preserve semantic safety.
Sum-of-Prob Sentence Scoring: Avoids length bias in hard-capping by scoring sentences based on total token probabilities.
Offset-Based Reconstruction: Compressed text preserves original spacing, punctuation, and casing.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
context_sieve		context_sieve
data		data
models		models
sample_datasets_to_benchmark		sample_datasets_to_benchmark
tests		tests
tools		tools
.gitignore		.gitignore
README.md		README.md
benchmark_runner.py		benchmark_runner.py
export_rl.py		export_rl.py
pyproject.toml		pyproject.toml
verify_aligner.py		verify_aligner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context Sieve v8.5

Overview

Architecture

Quick Start

1. Setup Environment

2. Configure Proxy API

3. Expand Training Corpus (Parallel)

4. Generate Training Labels (Parallel)

5. Train + Export + Calibrate (Phase 1: Cold Start)

6. RL Fine-tuning (Phase 2: Coreference-Aware PPO)

7. Run Benchmark

Core Modules

Architecture Highlights

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Context Sieve v8.5

Overview

Architecture

Quick Start

1. Setup Environment

2. Configure Proxy API

3. Expand Training Corpus (Parallel)

4. Generate Training Labels (Parallel)

5. Train + Export + Calibrate (Phase 1: Cold Start)

6. RL Fine-tuning (Phase 2: Coreference-Aware PPO)

7. Run Benchmark

Core Modules

Architecture Highlights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages