Replace gpt-researcher with a minimal in-house research engine

## Summary

Replace the `gpt-researcher` dependency with a ~300–400 line in-house research engine. Our actual runtime path uses a small fraction of what gpt-researcher ships, and the transitive dep graph is what's driving the bundle bloat tracked in #1.

This is follow-on work to #1 (`.mcpbignore` trims, done in 53ffc73, saved ~6 MB uncompressed). That issue's `.mcpbignore` fixes were banked; pruning individual vendored packages was judged too fragile. This issue is the durable alternative: stop depending on `gpt-researcher` entirely.

## What we actually use from gpt-researcher

I traced the import path against our vendored `deps/`. In our config (`RETRIEVER=tavily`, Anthropic for all LLM slots, `report_source=web`), the runtime hits exactly five responsibilities:

1. **Plan** — LLM turns the user query into 3–5 sub-queries
2. **Search** — Tavily API per sub-query → URLs + cleaned text
3. **Chunk + embed** — tiktoken + OpenAI `text-embedding-3-small` on retrieved content
4. **Rank** — cosine similarity, top-k chunks per sub-query
5. **Write** — LLM writes the markdown report from compressed context

Everything else in `gpt-researcher` (multi-retriever fallbacks, litellm routing, local doc loaders, vector store integrations, browser/nodriver scraping, hybrid/Azure/LangChainDocuments report sources, the `DocumentLoader` chain that drags in `unstructured` → `spacy`/`thinc`/`blis` at *import* time) is never reached.

## Scope

Add `mcp_research/engine/` with:

| Module | Rough size | Purpose |
|---|---|---|
| `planner.py` | ~50 lines | Query → sub-queries via Anthropic |
| `search.py` | ~30 lines | Tavily wrapper (use `include_raw_content=True` so we don't need our own scraper) |
| `chunker.py` | ~80 lines | tiktoken-based chunking + OpenAI embeddings |
| `retriever.py` | ~30 lines | Top-k cosine similarity |
| `writer.py` | ~50 lines | Context → markdown report via Anthropic |
| `orchestrator.py` | ~100 lines | The run loop with progress callbacks |

Existing `src/mcp_research/worker.py` stays — it already handles entity updates, progress streaming to `ctx`, timeouts, cancel/failure transitions, and the orphan reaper. Only the `GPTResearcher(...)` / `conduct_research()` / `write_report()` calls get swapped.

## Dep footprint

| Package | Size | Purpose |
|---|---|---|
| `anthropic` | 8 MB | planner + writer |
| `openai` | 12 MB | embeddings |
| `tavily-python` | <1 MB | search |
| `tiktoken` | 3 MB | chunking |
| `numpy` | 22 MB | cosine sim (optional — pure Python works at 1536-dim) |
| common transitive (httpx, pydantic) | ~5 MB | already required |

**Estimated total: ~50 MB uncompressed / ~15 MB compressed.** Down from 541 MB / 166 MB at 0.1.0 — roughly 10× smaller.

## What we lose (and don't use)

- Multi-retriever support (DDG, Bing, Google, SerpAPI, …) — we hardcode Tavily
- Multi-LLM routing via litellm — we hardcode Anthropic for fast/smart/strategic slots
- Local document loaders (PDF, docx, md, csv, xlsx) — we run web-only
- Vector store integrations (FAISS, Chroma, Pinecone, …) — in-memory is fine for per-run context
- Browser scraping (nodriver, playwright) — Tavily `include_raw_content` replaces this
- Hybrid/Azure/LangChainDocuments report sources — we only use `web`
- Report-type variants (outline, detailed, resource report, …) — we only use `research_report`

## Risks

1. **Report quality regression.** gpt-researcher's prompts are tuned from many real user runs. Ours won't be on day one. **Primary validation gate.** Mitigation: borrow prompt structure from gpt-researcher (Apache 2.0 — confirm before copying) and run a 10-query eval harness comparing both pipelines side-by-side.
2. **PDF URLs from Tavily.** We'll depend on Tavily's advanced search returning usable content for PDF results. If raw_content is weak for PDFs, fall back to `pypdf` (5 MB, pure Python — not pymupdf's 51 MB C extension).
3. **Maintenance.** 300 lines we own vs chasing `gpt-researcher` version bumps. Net probably a wash or favorable.

## Plan

1. Feature branch `engine/in-house`
2. Implement the six modules above; keep `worker.py`'s public contract unchanged
3. Write an eval harness: run 10 canned research queries through both engines, diff reports on length, source coverage, factual accuracy, readability
4. If parity → ship as 0.2.0 with `gpt-researcher` fully removed
5. If quality gap → tune prompts once or bail (revisit targeted dep pruning as plan B)

## Acceptance criteria

- [ ] `mcp_research/engine/` module implementing all five responsibilities
- [ ] `gpt-researcher`, `langchain-anthropic`, and their transitive chain removed from `pyproject.toml`
- [ ] Existing test suite passes with the `FakeGPTR` fixture replaced by an equivalent fake engine
- [ ] New eval harness comparing engine vs baseline on 10 queries, checked in under `tests/eval/`
- [ ] Bundle size <25 MB compressed / <75 MB uncompressed
- [ ] `start_research` smoke test passes end-to-end on a vanilla agent-platform pod
- [ ] Prompts documented in `docs/prompts.md` (or inline with provenance if lifted from gpt-researcher)

## Related

- #1 — Bundle size issue. `.mcpbignore` fixes from that issue are already in. This issue is the durable followup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace gpt-researcher with a minimal in-house research engine #2

Summary

What we actually use from gpt-researcher

Scope

Dep footprint

What we lose (and don't use)

Risks

Plan

Acceptance criteria

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Module	Rough size	Purpose
`planner.py`	~50 lines	Query → sub-queries via Anthropic
`search.py`	~30 lines	Tavily wrapper (use `include_raw_content=True` so we don't need our own scraper)
`chunker.py`	~80 lines	tiktoken-based chunking + OpenAI embeddings
`retriever.py`	~30 lines	Top-k cosine similarity
`writer.py`	~50 lines	Context → markdown report via Anthropic
`orchestrator.py`	~100 lines	The run loop with progress callbacks

Package	Size	Purpose
`anthropic`	8 MB	planner + writer
`openai`	12 MB	embeddings
`tavily-python`	<1 MB	search
`tiktoken`	3 MB	chunking
`numpy`	22 MB	cosine sim (optional — pure Python works at 1536-dim)
common transitive (httpx, pydantic)	~5 MB	already required

Replace gpt-researcher with a minimal in-house research engine #2

Description

Summary

What we actually use from gpt-researcher

Scope

Dep footprint

What we lose (and don't use)

Risks

Plan

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions