Skip to content

Meta: Migrate from Git Hash Versioning to Semantic Versioning #62

@ScuttleBot

Description

@ScuttleBot

Meta-Issue: Migrate from Git Hash Versioning to Semantic Versioning

Summary

Currently, PinchBench uses git commit hashes (short SHA) to identify benchmark versions. This creates a confusing user experience in the version dropdown on the leaderboard—users see opaque strings like a1b2c3d instead of meaningful version numbers like 1.0, 1.1, 2.0.

This issue tracks the work required to migrate to semantic versioning (SemVer) across all PinchBench components.

Current State (Git Hash Versioning)

How it works today:

  1. Skill repo (scripts/benchmark.py):

    def _get_git_version(script_dir: Path) -> str:
        result = subprocess.run(
            ["git", "rev-parse", "--short", "HEAD"],
            capture_output=True,
            text=True,
            timeout=2,
            check=False,
            cwd=script_dir,
        )
        return result.stdout.strip()  # e.g., "a1b2c3d"

    The benchmark results JSON includes: "benchmark_version": "a1b2c3d"

  2. API repo (src/routes/results.ts):

    • Accepts benchmark_version in submission payload
    • Stores in submissions.benchmark_version TEXT field
    • Auto-inserts into benchmark_versions table with id = git hash
  3. Database schema (schema.sql):

    CREATE TABLE benchmark_versions (
      id TEXT PRIMARY KEY,  -- Currently stores git hash
      created_at TEXT NOT NULL DEFAULT (datetime('now')),
      current INTEGER NOT NULL DEFAULT 0,
      hidden INTEGER NOT NULL DEFAULT 0
    );
  4. Leaderboard frontend (components/version-selector.tsx):

    • Fetches versions from /api/benchmark_versions
    • Displays version.id (git hash) in dropdown
    • Users must guess which hash corresponds to which benchmark iteration

Problems with current approach:

  • User confusion: Git hashes are meaningless to users
  • No indication of breaking changes: Can't tell if a new version changes the benchmark logic
  • Difficult to communicate: "Try the new version 2a8f9d2" vs "Try version 2.0"
  • No versioning semantics: Major/minor/patch distinctions are impossible

Desired State (Semantic Versioning)

Goals:

  1. Clear version numbers: 1.0, 1.1, 2.0 instead of a1b2c3d
  2. Meaningful increments:
    • MAJOR: Breaking changes to tasks or grading logic
    • MINOR: New tasks, non-breaking improvements
    • PATCH: Bug fixes, task clarifications
  3. Human-readable labels: "Version 1.0 (March 2025)" in the UI
  4. Backward compatibility: Old submissions with git hashes continue to work

Proposed version format:

MAJOR.MINOR.PATCH  (e.g., "1.0.0")

Simplified for UI: 1.0, 1.1, 2.0 (patch implied as .0)

Components Requiring Changes

1. Skill Repo (pinchbench/skill)

Files to modify:

  • scripts/benchmark.py:

    • Replace _get_git_version() with _get_benchmark_version()
    • Read from BENCHMARK_VERSION file or git tag
    • Fall back to git hash if no version file exists
  • scripts/lib_upload.py:

    • No changes needed (reads version from results JSON)
  • New file: BENCHMARK_VERSION:

    • Simple text file containing version string (e.g., "1.0.0")
    • Updated manually when releasing new benchmark versions

Proposed implementation:

def _get_benchmark_version(script_dir: Path) -> str:
    """Read benchmark version from VERSION file or git tag."""
    version_file = script_dir / "BENCHMARK_VERSION"
    if version_file.exists():
        return version_file.read_text().strip()
    
    # Fallback to git tag if available
    try:
        result = subprocess.run(
            ["git", "describe", "--tags", "--always"],
            capture_output=True,
            text=True,
            timeout=2,
            cwd=script_dir,
        )
        if result.returncode == 0:
            return result.stdout.strip()
    except (subprocess.SubprocessError, FileNotFoundError):
        pass
    
    # Ultimate fallback to git hash
    return _get_git_version(script_dir)

2. API Repo (pinchbench/api)

Files to modify:

  • schema.sql:

    • Add version_label TEXT field for display name
    • Add semver TEXT field for semantic version
    • Keep id as PRIMARY KEY for backward compatibility
  • src/routes/results.ts:

    • Accept semantic versions in submissions
    • Handle both old git hashes and new semver
  • src/routes/benchmarkVersions.ts:

    • Return semver and label fields in API response
    • Sort by semver (not just created_at)

Proposed schema changes:

-- Enhanced benchmark_versions table
CREATE TABLE IF NOT EXISTS benchmark_versions (
  id TEXT PRIMARY KEY,           -- Can be git hash OR semver
  semver TEXT,                   -- "1.0.0", "1.1.0", "2.0.0"
  label TEXT,                    -- Display label: "Version 1.0 (March 2025)"
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  current INTEGER NOT NULL DEFAULT 0,
  hidden INTEGER NOT NULL DEFAULT 0
);

-- Index for semver lookups
CREATE INDEX IF NOT EXISTS idx_benchmark_versions_semver ON benchmark_versions(semver);

3. Leaderboard Repo (pinchbench/leaderboard)

Files to modify:

  • components/version-selector.tsx:

    • Display label or semver instead of id
    • Sort versions semantically (not alphabetically)
    • Group by major version
  • lib/types.ts:

    • Add semver and label fields to version types

Proposed UI changes:

// Version selector display
// Before: "a1b2c3d"
// After: "Version 2.0" or "2.0.1"

4. Database Migration

Migration script needed:

-- Step 1: Add new columns
ALTER TABLE benchmark_versions ADD COLUMN semver TEXT;
ALTER TABLE benchmark_versions ADD COLUMN label TEXT;

-- Step 2: Backfill existing git-hash versions
-- This requires manual mapping or setting a cutoff date
-- Example: All submissions before 2025-03-01 = "0.x" (legacy)
--          Submissions after = "1.0.0"

-- Step 3: Update API to prefer semver/label over id

5. CI/Build Process

GitHub Actions changes:

  • Add workflow to validate BENCHMARK_VERSION file matches git tag on release
  • Block releases if version file is not updated
  • Auto-populate benchmark_versions table with new semver on deployment

Release process:

  1. Update BENCHMARK_VERSION file in skill repo
  2. Create git tag matching version (e.g., v1.0.0)
  3. API automatically picks up new version on first submission
  4. Admin manually sets current=1 for new version when ready

Migration Plan

Phase 1: Schema & API (Week 1)

  • Add semver and label columns to benchmark_versions table
  • Update API routes to return new fields
  • Ensure backward compatibility (old clients still work)

Phase 2: Skill Repo (Week 1-2)

  • Add BENCHMARK_VERSION file
  • Update _get_git_version() to read from VERSION file
  • Add CI check for version file consistency

Phase 3: Leaderboard (Week 2)

  • Update version selector to display label or semver
  • Add semantic sorting for versions
  • Update type definitions

Phase 4: Data Migration (Week 3)

  • Backfill existing git-hash versions with semver labels
  • Decide on cutoff: existing = "Legacy (pre-1.0)", new = "1.0.0"
  • Update current flag for appropriate version

Phase 5: Documentation (Week 3)

  • Document versioning scheme in README
  • Add migration guide for users
  • Update API docs

Backward Compatibility Considerations

API compatibility:

  • Old submissions with git hashes continue to work
  • API returns both id (hash) and semver (new)
  • Frontend can handle missing semver fields

Database compatibility:

  • id remains PRIMARY KEY (never changes)
  • New columns are nullable (existing rows work)
  • Queries can filter by semver IS NOT NULL for new-style versions

Client compatibility:

  • Old skill versions still submit git hashes
  • API accepts both formats
  • Version selector falls back to id if label is null

Acceptance Criteria

  • Version dropdown on leaderboard shows readable labels (e.g., "Version 1.0")
  • New benchmark submissions use semantic versions (e.g., "1.0.0")
  • Old submissions with git hashes still display and function correctly
  • API returns semver and label fields for all versions
  • Skill repo reads version from BENCHMARK_VERSION file
  • Database schema supports both legacy and new versioning
  • Documentation explains the versioning scheme
  • Release process documented for maintainers

Open Questions

  1. Should we retroactively assign semver to old git hashes, or group them as "Legacy (pre-1.0)"?
  2. How do we handle multiple submissions with the same semver but different skill commits (patch releases)?
  3. Should the API enforce semver format validation?
  4. Do we need a changelog for benchmark versions (what changed between 1.0 and 2.0)?

Related Issues

  • None yet — this is the tracking issue

Labels: enhancement, meta, versioning, breaking-change
Assignees: TBD

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions