-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Meta-Issue: Migrate from Git Hash Versioning to Semantic Versioning
Summary
Currently, PinchBench uses git commit hashes (short SHA) to identify benchmark versions. This creates a confusing user experience in the version dropdown on the leaderboard—users see opaque strings like a1b2c3d instead of meaningful version numbers like 1.0, 1.1, 2.0.
This issue tracks the work required to migrate to semantic versioning (SemVer) across all PinchBench components.
Current State (Git Hash Versioning)
How it works today:
-
Skill repo (
scripts/benchmark.py):def _get_git_version(script_dir: Path) -> str: result = subprocess.run( ["git", "rev-parse", "--short", "HEAD"], capture_output=True, text=True, timeout=2, check=False, cwd=script_dir, ) return result.stdout.strip() # e.g., "a1b2c3d"
The benchmark results JSON includes:
"benchmark_version": "a1b2c3d" -
API repo (
src/routes/results.ts):- Accepts
benchmark_versionin submission payload - Stores in
submissions.benchmark_versionTEXT field - Auto-inserts into
benchmark_versionstable withid= git hash
- Accepts
-
Database schema (
schema.sql):CREATE TABLE benchmark_versions ( id TEXT PRIMARY KEY, -- Currently stores git hash created_at TEXT NOT NULL DEFAULT (datetime('now')), current INTEGER NOT NULL DEFAULT 0, hidden INTEGER NOT NULL DEFAULT 0 );
-
Leaderboard frontend (
components/version-selector.tsx):- Fetches versions from
/api/benchmark_versions - Displays
version.id(git hash) in dropdown - Users must guess which hash corresponds to which benchmark iteration
- Fetches versions from
Problems with current approach:
- User confusion: Git hashes are meaningless to users
- No indication of breaking changes: Can't tell if a new version changes the benchmark logic
- Difficult to communicate: "Try the new version 2a8f9d2" vs "Try version 2.0"
- No versioning semantics: Major/minor/patch distinctions are impossible
Desired State (Semantic Versioning)
Goals:
- Clear version numbers:
1.0,1.1,2.0instead ofa1b2c3d - Meaningful increments:
- MAJOR: Breaking changes to tasks or grading logic
- MINOR: New tasks, non-breaking improvements
- PATCH: Bug fixes, task clarifications
- Human-readable labels: "Version 1.0 (March 2025)" in the UI
- Backward compatibility: Old submissions with git hashes continue to work
Proposed version format:
MAJOR.MINOR.PATCH (e.g., "1.0.0")
Simplified for UI: 1.0, 1.1, 2.0 (patch implied as .0)
Components Requiring Changes
1. Skill Repo (pinchbench/skill)
Files to modify:
-
scripts/benchmark.py:- Replace
_get_git_version()with_get_benchmark_version() - Read from
BENCHMARK_VERSIONfile or git tag - Fall back to git hash if no version file exists
- Replace
-
scripts/lib_upload.py:- No changes needed (reads version from results JSON)
-
New file:
BENCHMARK_VERSION:- Simple text file containing version string (e.g., "1.0.0")
- Updated manually when releasing new benchmark versions
Proposed implementation:
def _get_benchmark_version(script_dir: Path) -> str:
"""Read benchmark version from VERSION file or git tag."""
version_file = script_dir / "BENCHMARK_VERSION"
if version_file.exists():
return version_file.read_text().strip()
# Fallback to git tag if available
try:
result = subprocess.run(
["git", "describe", "--tags", "--always"],
capture_output=True,
text=True,
timeout=2,
cwd=script_dir,
)
if result.returncode == 0:
return result.stdout.strip()
except (subprocess.SubprocessError, FileNotFoundError):
pass
# Ultimate fallback to git hash
return _get_git_version(script_dir)2. API Repo (pinchbench/api)
Files to modify:
-
schema.sql:- Add
version_labelTEXT field for display name - Add
semverTEXT field for semantic version - Keep
idas PRIMARY KEY for backward compatibility
- Add
-
src/routes/results.ts:- Accept semantic versions in submissions
- Handle both old git hashes and new semver
-
src/routes/benchmarkVersions.ts:- Return
semverandlabelfields in API response - Sort by semver (not just created_at)
- Return
Proposed schema changes:
-- Enhanced benchmark_versions table
CREATE TABLE IF NOT EXISTS benchmark_versions (
id TEXT PRIMARY KEY, -- Can be git hash OR semver
semver TEXT, -- "1.0.0", "1.1.0", "2.0.0"
label TEXT, -- Display label: "Version 1.0 (March 2025)"
created_at TEXT NOT NULL DEFAULT (datetime('now')),
current INTEGER NOT NULL DEFAULT 0,
hidden INTEGER NOT NULL DEFAULT 0
);
-- Index for semver lookups
CREATE INDEX IF NOT EXISTS idx_benchmark_versions_semver ON benchmark_versions(semver);3. Leaderboard Repo (pinchbench/leaderboard)
Files to modify:
-
components/version-selector.tsx:- Display
labelorsemverinstead ofid - Sort versions semantically (not alphabetically)
- Group by major version
- Display
-
lib/types.ts:- Add
semverandlabelfields to version types
- Add
Proposed UI changes:
// Version selector display
// Before: "a1b2c3d"
// After: "Version 2.0" or "2.0.1"4. Database Migration
Migration script needed:
-- Step 1: Add new columns
ALTER TABLE benchmark_versions ADD COLUMN semver TEXT;
ALTER TABLE benchmark_versions ADD COLUMN label TEXT;
-- Step 2: Backfill existing git-hash versions
-- This requires manual mapping or setting a cutoff date
-- Example: All submissions before 2025-03-01 = "0.x" (legacy)
-- Submissions after = "1.0.0"
-- Step 3: Update API to prefer semver/label over id5. CI/Build Process
GitHub Actions changes:
- Add workflow to validate
BENCHMARK_VERSIONfile matches git tag on release - Block releases if version file is not updated
- Auto-populate
benchmark_versionstable with new semver on deployment
Release process:
- Update
BENCHMARK_VERSIONfile in skill repo - Create git tag matching version (e.g.,
v1.0.0) - API automatically picks up new version on first submission
- Admin manually sets
current=1for new version when ready
Migration Plan
Phase 1: Schema & API (Week 1)
- Add
semverandlabelcolumns tobenchmark_versionstable - Update API routes to return new fields
- Ensure backward compatibility (old clients still work)
Phase 2: Skill Repo (Week 1-2)
- Add
BENCHMARK_VERSIONfile - Update
_get_git_version()to read from VERSION file - Add CI check for version file consistency
Phase 3: Leaderboard (Week 2)
- Update version selector to display
labelorsemver - Add semantic sorting for versions
- Update type definitions
Phase 4: Data Migration (Week 3)
- Backfill existing git-hash versions with semver labels
- Decide on cutoff: existing = "Legacy (pre-1.0)", new = "1.0.0"
- Update
currentflag for appropriate version
Phase 5: Documentation (Week 3)
- Document versioning scheme in README
- Add migration guide for users
- Update API docs
Backward Compatibility Considerations
API compatibility:
- Old submissions with git hashes continue to work
- API returns both
id(hash) andsemver(new) - Frontend can handle missing
semverfields
Database compatibility:
idremains PRIMARY KEY (never changes)- New columns are nullable (existing rows work)
- Queries can filter by
semver IS NOT NULLfor new-style versions
Client compatibility:
- Old skill versions still submit git hashes
- API accepts both formats
- Version selector falls back to
idiflabelis null
Acceptance Criteria
- Version dropdown on leaderboard shows readable labels (e.g., "Version 1.0")
- New benchmark submissions use semantic versions (e.g., "1.0.0")
- Old submissions with git hashes still display and function correctly
- API returns
semverandlabelfields for all versions - Skill repo reads version from
BENCHMARK_VERSIONfile - Database schema supports both legacy and new versioning
- Documentation explains the versioning scheme
- Release process documented for maintainers
Open Questions
- Should we retroactively assign semver to old git hashes, or group them as "Legacy (pre-1.0)"?
- How do we handle multiple submissions with the same semver but different skill commits (patch releases)?
- Should the API enforce semver format validation?
- Do we need a changelog for benchmark versions (what changed between 1.0 and 2.0)?
Related Issues
- None yet — this is the tracking issue
Labels: enhancement, meta, versioning, breaking-change
Assignees: TBD