A Multi-LLM Deliberation Framework for Research Paper Analysis and Academic Question Answering.
Built from scratch, inspired by Andrej Karpathy's LLM Council project.
The system routes a research query through four specialized LLMs simultaneously, anonymizes their responses, and uses a meta-model as an impartial judge to evaluate and synthesize a final answer.
User Query → Council Models (Parallel) → Anonymization → Meta-Model → Final Answer
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4
llm-council/
│
├── README.md
├── Text_Model_Judge_Response.txt ← Judge prompt used by the meta-model
│
├── ├── LLM_Council_OpenRouter.ipynb ← Multi-provider version
│ └── LLM_Council_OpenAI.ipynb ← OpenAI-only version with PDF support
│
└── ── LLM_Council_Report.pdf ← Full project report
Uses the OpenRouter API to access models from multiple providers via the Chat Completions API.
| Model | Provider | Research Strength |
|---|---|---|
deepseek/deepseek-v3.2 |
DeepSeek | Mathematical and logical reasoning |
qwen/qwen3-235b-a22b |
Alibaba | Literature breadth, long context |
anthropic/claude-sonnet-4-6 |
Anthropic | Academic writing and analysis |
google/gemini-3-flash-preview |
Factual grounding, speed |
Meta-Model (Chairman) : openai/gpt-5.4
Uses the native OpenAI Responses API with support for PDF ingestion via file upload or URL.
| Model | Family | Research Strength |
|---|---|---|
gpt-5.4-mini |
Language Model | Fast, efficient general QA |
o3 |
Reasoning Model | Deep multi-step academic reasoning |
gpt-5.1 |
Language Model | Balanced research analysis |
gpt-5.2 |
Language Model | Strong academic writing and synthesis |
Meta-Model (Chairman) : gpt-5.4
| Mode | Description |
|---|---|
text |
Plain research query or pasted abstract |
url |
Direct link to a PDF (e.g. arXiv URL) |
upload |
Upload a local PDF file via OpenAI Files API |
All four council models receive the user query simultaneously via Python threading. Total latency ≈ slowest model, not sum of all models.
Model identities are stripped and replaced with neutral labels (Response A, B, C, D) to eliminate self-preference bias — the tendency of a model to favour its own output.
The meta-model (Chairman) receives the original query and all anonymized responses. It evaluates each response and synthesizes a final consolidated answer.
Each council response is scored by the meta-model on four academic criteria:
| Metric | Description | Score |
|---|---|---|
| Correctness | Factual accuracy of the content | /10 |
| Clarity | Structure and readability | /10 |
| Completeness | Coverage of methodology, findings, limitations | /10 |
| Academic Depth | Goes beyond surface-level explanation | /10 |
| Total | /40 |
- Python 3.10+
openailibrary (pip install openai)- Google Colab (recommended)
- Get your API key from openrouter.ai
- In Google Colab go to
Tools → Secrets → Add new secret - Name :
OpenRouter, Value : your API key - Upload
Text_Model_Judge_Response.txtto/content/ - Open and run
LLM_Council_OpenRouter.ipynb
- Get your API key from platform.openai.com
- In Google Colab go to
Tools → Secrets → Add new secret - Name :
OpenAI, Value : your API key - Upload
Text_Model_Judge_Response.txtto/content/ - Open and run
LLM_Council_OpenAI.ipynb - When prompted, select a mode :
text— type your research query directlyurl— paste a PDF link (e.g.https://arxiv.org/pdf/1706.03762)upload— upload your PDF to/content/and enter the file path
| Feature | OpenRouter Version | OpenAI Version |
|---|---|---|
| API | Chat Completions | Responses API |
| SDK call | client.chat.completions.create() |
client.responses.create() |
| Response field | response.choices[0].message.content |
response.output_text |
| Model diversity | Cross-provider | Architectural (reasoning vs language) |
| PDF support | Not supported | Native via file_id / file_url |
| File upload | Not supported | OpenAI Files API |
=== Research Paper Assistant (LLM Council) ===
Options :
1. Ask a question about a research topic
2. Paste a paper abstract for analysis
3. Ask for a literature summary on a topic
Enter your research query : On Layer Normalization in Transformers
--- Querying Council Models ---
llm_4 (google/gemini-3-flash-preview) → Success | Latency : 6.81s
llm_3 (anthropic/claude-sonnet-4-6) → Success | Latency : 17.02s
llm_2 (qwen/qwen3-235b-a22b) → Success | Latency : 23.61s
llm_1 (deepseek/deepseek-v3.2) → Success | Latency : 39.47s
--- Council Deliberation Complete ---
--- Final Research Analysis (Meta Model) ---
Response A :
- Correctness : 8/10
- Clarity : 9/10
- Completeness : 9/10
- Academic Depth : 8/10
- Total : 34/40
Rankings (Best to Worst) : A > D > B > C
Final Synthesized Answer :
...
Andrej Karpathy's LLM Council — https://github.com/karpathy/llm-council
- Vaswani, A. et al. (2017). Attention is All You Need. NeurIPS.
- Karpathy, A. (2025). LLM Council. GitHub. https://github.com/karpathy/llm-council
- Zheng, L. et al. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685.
- OpenAI. (2025). Responses API Reference. https://platform.openai.com/docs/api-reference/responses
- OpenAI. (2025). Files API Reference. https://platform.openai.com/docs/api-reference/files
- Dietterich, T. G. (2000). Ensemble Methods in Machine Learning. Springer.
- Wang, P. et al. (2023). Large language models are not yet human-level evaluators. arXiv:2305.13091.