feat: add ranking conflict detection between models#130
Open
bledden wants to merge 4 commits intokarpathy:masterfrom
Open
feat: add ranking conflict detection between models#130bledden wants to merge 4 commits intokarpathy:masterfrom
bledden wants to merge 4 commits intokarpathy:masterfrom
Conversation
Adds calculate_tournament_rankings() as an alternative to simple mean ranking. Algorithm: - Convert ordinal rankings to pairwise matchups - For each pair of models, majority vote determines winner - Ties awarded 0.5 points to each - Final score = wins / total_matchups Benefits over mean ranking: - More robust to outlier rankings - Theoretically principled (Condorcet-style) - Handles cyclic preferences gracefully Both ranking methods now included in metadata: - aggregate_rankings: mean position (existing) - tournament_rankings: pairwise win percentage (new) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Documents the tournament-style pairwise comparison algorithm with: - Explanation of why it's more robust than mean averaging - Concrete example showing self-promotion bias scenario - Tables comparing mean vs tournament results - Outlier robustness validation (mean degrades 1.0→1.5, tournament stays 100%) - Summary of validation test coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Detects when ≥30% of rankers significantly disagree with the consensus ranking for a model (placing it more than 1 position away from consensus). Backend changes: - Add detect_minority_opinions() function to council.py - Uses tournament ranking as consensus baseline - Reports dissent rate, positions, dissenters, and direction (overvalued/undervalued) - Configurable threshold (default 30%) and position tolerance (default 1) - Include minority_opinions in run_full_council metadata Frontend changes: - Add minorityOpinions prop to Stage2 component - Display minority opinions in a warning-styled card - Show direction badges (overvalued in red, undervalued in green) - List consensus position, dissent positions, and dissenter models Validation tests: - 8 test cases covering consensus, dissent detection, direction, threshold filtering, tolerance, edge cases, and realistic scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Detects fundamental disagreements between models based on how they rank each other. Two types of conflicts are identified: 1. Mutual Opposition (high severity): Both models rank the other poorly while ranking themselves highly - indicates fundamental disagreement about response quality. 2. Ranking Swap (medium severity): Large position difference in how models rank each other - one places the other high, the other places them low. Backend changes: - Add detect_ranking_conflicts() function to council.py - Builds ranking matrix showing how each model ranked every other model - Detects mutual opposition and ranking swaps with configurable thresholds - Returns conflict type, severity, and detailed ranking positions - Include ranking_conflicts in run_full_council metadata Frontend changes: - Add rankingConflicts prop to Stage2 component - Display conflicts in a red-styled card (distinct from yellow minority opinions) - Severity badges (high=red, medium=orange, low=yellow) - Show which models are in conflict and their mutual rankings Validation tests: - 8 test cases covering agreement, mutual opposition, ranking swaps, edge cases, 5-model scenarios, detail population, and severity ordering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds detection of fundamental disagreements between models based on how they rank each other during Stage 2.
This helps distinguish between:
Conflict Types
Mutual Opposition (High Severity)
Both models rank each other poorly while ranking themselves highly. This pattern strongly indicates fundamental disagreement rather than stylistic preferences.
Example: Model A ranks itself #1 and Model B #3, while Model B ranks itself #1 and Model A #3.
Ranking Swap (Medium Severity)
Large position difference in mutual rankings. One model places the other high, but the other places them low.
Example: Model A ranks Model B #1, but Model B ranks Model A #4.
Changes
Backend (
backend/council.py):detect_ranking_conflicts()functionranker_rankings[ranker_model][ranked_model] = positionFrontend:
Stage2.jsx: AddedrankingConflictsprop and display componentStage2.css: Red-styled card (distinct from yellow minority opinions)ChatInterface.jsx: Pass ranking_conflicts to Stage2Validation
8 test cases in
tests/test_ranking_conflicts.py:test_no_conflict_when_agreementtest_mutual_opposition_detectedtest_ranking_swap_detectedtest_empty_inputstest_single_model_no_conflicttest_5_model_conflict_scenariotest_conflict_details_populatedtest_severity_orderingDependencies
This PR builds on:
Test plan
python3 tests/test_ranking_conflicts.py- all 8 tests passpython3 tests/test_minority_opinions.py- all 8 tests pass🤖 Generated with Claude Code