feat: add ranking conflict detection between models by bledden · Pull Request #130 · karpathy/llm-council

bledden · 2026-01-06T04:33:39Z

Summary

Adds detection of fundamental disagreements between models based on how they rank each other during Stage 2.

This helps distinguish between:

Wording differences: Models agree on substance but phrase things differently
Fundamental conflicts: Models genuinely disagree about response quality

Conflict Types

Mutual Opposition (High Severity)

Both models rank each other poorly while ranking themselves highly. This pattern strongly indicates fundamental disagreement rather than stylistic preferences.

Example: Model A ranks itself #1 and Model B #3, while Model B ranks itself #1 and Model A #3.

Ranking Swap (Medium Severity)

Large position difference in mutual rankings. One model places the other high, but the other places them low.

Example: Model A ranks Model B #1, but Model B ranks Model A #4.

Changes

Backend (backend/council.py):

New detect_ranking_conflicts() function
Builds ranking matrix: ranker_rankings[ranker_model][ranked_model] = position
Detects mutual opposition when both models rank other poorly + self highly
Detects ranking swaps when position difference ≥ (total_models - 1)
Returns: model_a, model_b, conflict_type, details, severity

Frontend:

Stage2.jsx: Added rankingConflicts prop and display component
Stage2.css: Red-styled card (distinct from yellow minority opinions)
Severity-colored left border (high=red, medium=orange, low=yellow)
ChatInterface.jsx: Pass ranking_conflicts to Stage2

Validation

8 test cases in tests/test_ranking_conflicts.py:

Test	Description
`test_no_conflict_when_agreement`	No false positives when models roughly agree
`test_mutual_opposition_detected`	Correctly detects mutual opposition pattern
`test_ranking_swap_detected`	Detects large position disagreements
`test_empty_inputs`	Handles edge cases gracefully
`test_single_model_no_conflict`	Single model has no conflicts
`test_5_model_conflict_scenario`	Realistic 5-model scenario
`test_conflict_details_populated`	Verifies detail fields are correct
`test_severity_ordering`	Conflicts sorted by severity

$ python3 tests/test_ranking_conflicts.py
✓ No conflict when agreement - PASSED
✓ Mutual opposition detected - PASSED
✓ Ranking swap detected - PASSED
✓ Empty inputs - PASSED
✓ Single model no conflict - PASSED
  Found 3 conflicts in 5-model scenario
    gpt-4 vs grok: mutual_opposition (high)
    gpt-4 vs llama: mutual_opposition (high)
    grok vs llama: mutual_opposition (high)
✓ 5-model conflict scenario - PASSED
  Details: A ranks B=3, B ranks A=3
  Self ranks: A=1, B=1
✓ Conflict details populated - PASSED
✓ Severity ordering - PASSED

==================================================
All ranking conflict tests passed!
==================================================

Dependencies

This PR builds on:

PR Add tournament-style pairwise ranking aggregation #128 (tournament ranking)
PR feat: add minority opinion detection for ranking disagreements #129 (minority opinions)

Test plan

Run python3 tests/test_ranking_conflicts.py - all 8 tests pass
Run python3 tests/test_minority_opinions.py - all 8 tests pass
Start the app and verify conflicts display correctly in Stage 2
Verify no conflicts shown when models have similar rankings

🤖 Generated with Claude Code

Adds calculate_tournament_rankings() as an alternative to simple mean ranking. Algorithm: - Convert ordinal rankings to pairwise matchups - For each pair of models, majority vote determines winner - Ties awarded 0.5 points to each - Final score = wins / total_matchups Benefits over mean ranking: - More robust to outlier rankings - Theoretically principled (Condorcet-style) - Handles cyclic preferences gracefully Both ranking methods now included in metadata: - aggregate_rankings: mean position (existing) - tournament_rankings: pairwise win percentage (new) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Documents the tournament-style pairwise comparison algorithm with: - Explanation of why it's more robust than mean averaging - Concrete example showing self-promotion bias scenario - Tables comparing mean vs tournament results - Outlier robustness validation (mean degrades 1.0→1.5, tournament stays 100%) - Summary of validation test coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Detects when ≥30% of rankers significantly disagree with the consensus ranking for a model (placing it more than 1 position away from consensus). Backend changes: - Add detect_minority_opinions() function to council.py - Uses tournament ranking as consensus baseline - Reports dissent rate, positions, dissenters, and direction (overvalued/undervalued) - Configurable threshold (default 30%) and position tolerance (default 1) - Include minority_opinions in run_full_council metadata Frontend changes: - Add minorityOpinions prop to Stage2 component - Display minority opinions in a warning-styled card - Show direction badges (overvalued in red, undervalued in green) - List consensus position, dissent positions, and dissenter models Validation tests: - 8 test cases covering consensus, dissent detection, direction, threshold filtering, tolerance, edge cases, and realistic scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Detects fundamental disagreements between models based on how they rank each other. Two types of conflicts are identified: 1. Mutual Opposition (high severity): Both models rank the other poorly while ranking themselves highly - indicates fundamental disagreement about response quality. 2. Ranking Swap (medium severity): Large position difference in how models rank each other - one places the other high, the other places them low. Backend changes: - Add detect_ranking_conflicts() function to council.py - Builds ranking matrix showing how each model ranked every other model - Detects mutual opposition and ranking swaps with configurable thresholds - Returns conflict type, severity, and detailed ranking positions - Include ranking_conflicts in run_full_council metadata Frontend changes: - Add rankingConflicts prop to Stage2 component - Display conflicts in a red-styled card (distinct from yellow minority opinions) - Severity badges (high=red, medium=orange, low=yellow) - Show which models are in conflict and their mutual rankings Validation tests: - 8 test cases covering agreement, mutual opposition, ranking swaps, edge cases, 5-model scenarios, detail population, and severity ordering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bledden and others added 4 commits January 5, 2026 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ranking conflict detection between models#130

feat: add ranking conflict detection between models#130
bledden wants to merge 4 commits intokarpathy:masterfrom
bledden:feature-ranking-conflicts

bledden commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bledden commented Jan 6, 2026

Summary

Conflict Types

Mutual Opposition (High Severity)

Ranking Swap (Medium Severity)

Changes

Validation

Dependencies

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant