LLM Council — Research Paper Assistant

Name : Shoeb Shakil Sutar

PRN : 25070149022

Subject : Large Language Models

A Multi-LLM Deliberation Framework for Research Paper Analysis and Academic Question Answering.

Built from scratch, inspired by Andrej Karpathy's LLM Council project.

Overview

The system routes a research query through four specialized LLMs simultaneously, anonymizes their responses, and uses a meta-model as an impartial judge to evaluate and synthesize a final answer.

Pipeline

User Query → Council Models (Parallel) → Anonymization → Meta-Model → Final Answer
  Stage 0         Stage 1                   Stage 2         Stage 3      Stage 4

Repository Structure

llm-council/
│
├── README.md
├── Text_Model_Judge_Response.txt        ← Judge prompt used by the meta-model
│
├── ├── LLM_Council_OpenRouter.ipynb     ← Multi-provider version
│   └── LLM_Council_OpenAI.ipynb         ← OpenAI-only version with PDF support
│
└── ── LLM_Council_Report.pdf           ← Full project report

Versions

Version 1 — OpenRouter (Multi-Provider Council)

Uses the OpenRouter API to access models from multiple providers via the Chat Completions API.

Model	Provider	Research Strength
`deepseek/deepseek-v3.2`	DeepSeek	Mathematical and logical reasoning
`qwen/qwen3-235b-a22b`	Alibaba	Literature breadth, long context
`anthropic/claude-sonnet-4-6`	Anthropic	Academic writing and analysis
`google/gemini-3-flash-preview`	Google	Factual grounding, speed

Meta-Model (Chairman) : openai/gpt-5.4

Version 2 — OpenAI Only (Responses API + PDF Support)

Uses the native OpenAI Responses API with support for PDF ingestion via file upload or URL.

Model	Family	Research Strength
`gpt-5.4-mini`	Language Model	Fast, efficient general QA
`o3`	Reasoning Model	Deep multi-step academic reasoning
`gpt-5.1`	Language Model	Balanced research analysis
`gpt-5.2`	Language Model	Strong academic writing and synthesis

Meta-Model (Chairman) : gpt-5.4

Input Modes (Version 2 only)

Mode	Description
`text`	Plain research query or pasted abstract
`url`	Direct link to a PDF (e.g. arXiv URL)
`upload`	Upload a local PDF file via OpenAI Files API

How It Works

Stage 1 — Parallel Inference

All four council models receive the user query simultaneously via Python threading. Total latency ≈ slowest model, not sum of all models.

Stage 2 — Anonymization

Model identities are stripped and replaced with neutral labels (Response A, B, C, D) to eliminate self-preference bias — the tendency of a model to favour its own output.

Stage 3 — Meta-Model Deliberation

The meta-model (Chairman) receives the original query and all anonymized responses. It evaluates each response and synthesizes a final consolidated answer.

Evaluation Criteria

Each council response is scored by the meta-model on four academic criteria:

Metric	Description	Score
Correctness	Factual accuracy of the content	/10
Clarity	Structure and readability	/10
Completeness	Coverage of methodology, findings, limitations	/10
Academic Depth	Goes beyond surface-level explanation	/10
Total		/40

Setup and Usage

Requirements

Python 3.10+
openai library (pip install openai)
Google Colab (recommended)

Version 1 — OpenRouter Setup

Get your API key from openrouter.ai
In Google Colab go to Tools → Secrets → Add new secret
Name : OpenRouter, Value : your API key
Upload Text_Model_Judge_Response.txt to /content/
Open and run LLM_Council_OpenRouter.ipynb

Version 2 — OpenAI Setup

Get your API key from platform.openai.com
In Google Colab go to Tools → Secrets → Add new secret
Name : OpenAI, Value : your API key
Upload Text_Model_Judge_Response.txt to /content/
Open and run LLM_Council_OpenAI.ipynb
When prompted, select a mode :
- text — type your research query directly
- url — paste a PDF link (e.g. https://arxiv.org/pdf/1706.03762)
- upload — upload your PDF to /content/ and enter the file path

Key Differences Between Versions

Feature	OpenRouter Version	OpenAI Version
API	Chat Completions	Responses API
SDK call	`client.chat.completions.create()`	`client.responses.create()`
Response field	`response.choices[0].message.content`	`response.output_text`
Model diversity	Cross-provider	Architectural (reasoning vs language)
PDF support	Not supported	Native via `file_id` / `file_url`
File upload	Not supported	OpenAI Files API

Sample Output

=== Research Paper Assistant (LLM Council) ===
Options :
  1. Ask a question about a research topic
  2. Paste a paper abstract for analysis
  3. Ask for a literature summary on a topic

Enter your research query : On Layer Normalization in Transformers

--- Querying Council Models ---
  llm_4 (google/gemini-3-flash-preview) → Success | Latency : 6.81s
  llm_3 (anthropic/claude-sonnet-4-6)   → Success | Latency : 17.02s
  llm_2 (qwen/qwen3-235b-a22b)          → Success | Latency : 23.61s
  llm_1 (deepseek/deepseek-v3.2)        → Success | Latency : 39.47s

--- Council Deliberation Complete ---

--- Final Research Analysis (Meta Model) ---
Response A :
  - Correctness     : 8/10
  - Clarity         : 9/10
  - Completeness    : 9/10
  - Academic Depth  : 8/10
  - Total           : 34/40

Rankings (Best to Worst) : A > D > B > C

Final Synthesized Answer :
...

Inspired By

Andrej Karpathy's LLM Council — https://github.com/karpathy/llm-council

References

Vaswani, A. et al. (2017). Attention is All You Need. NeurIPS.
Karpathy, A. (2025). LLM Council. GitHub. https://github.com/karpathy/llm-council
Zheng, L. et al. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685.
OpenAI. (2025). Responses API Reference. https://platform.openai.com/docs/api-reference/responses
OpenAI. (2025). Files API Reference. https://platform.openai.com/docs/api-reference/files
Dietterich, T. G. (2000). Ensemble Methods in Machine Learning. Springer.
Wang, P. et al. (2023). Large language models are not yet human-level evaluators. arXiv:2305.13091.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Council — Research Paper Assistant

Name : Shoeb Shakil Sutar

PRN : 25070149022

Subject : Large Language Models

Overview

Pipeline

Repository Structure

Versions

Version 1 — OpenRouter (Multi-Provider Council)

Version 2 — OpenAI Only (Responses API + PDF Support)

Input Modes (Version 2 only)

How It Works

Stage 1 — Parallel Inference

Stage 2 — Anonymization

Stage 3 — Meta-Model Deliberation

Evaluation Criteria

Setup and Usage

Requirements

Version 1 — OpenRouter Setup

Version 2 — OpenAI Setup

Key Differences Between Versions

Sample Output

Inspired By

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
25070149022_LLM_Mini_Project.pdf		25070149022_LLM_Mini_Project.pdf
LLM_Council_OpenAI.ipynb		LLM_Council_OpenAI.ipynb
LLM_Council_OpenRouter.ipynb		LLM_Council_OpenRouter.ipynb
README.md		README.md
Text_Judge_Model_Response.txt		Text_Judge_Model_Response.txt

Folders and files

Latest commit

History

Repository files navigation

LLM Council — Research Paper Assistant

Name : Shoeb Shakil Sutar

PRN : 25070149022

Subject : Large Language Models

Overview

Pipeline

Repository Structure

Versions

Version 1 — OpenRouter (Multi-Provider Council)

Version 2 — OpenAI Only (Responses API + PDF Support)

Input Modes (Version 2 only)

How It Works

Stage 1 — Parallel Inference

Stage 2 — Anonymization

Stage 3 — Meta-Model Deliberation

Evaluation Criteria

Setup and Usage

Requirements

Version 1 — OpenRouter Setup

Version 2 — OpenAI Setup

Key Differences Between Versions

Sample Output

Inspired By

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages