Skip to content

Conversation

@khromov
Copy link
Collaborator

@khromov khromov commented Dec 22, 2025

Since we decided to only do one run per model as opposed to pass@k I brought back the old v1 tests for now, with the goal of merging some of them into more complex tests in the future, and add new tests. All the old tests have been rewritten with the "test first" approach. This will also give a percentage score for each model.

I've also added validator.ts support for tests, where we can test that the AI actually uses the intended functions eg $derived.by is actually used as opposed to just creating a component that passes the tests with $effect or something without actually using the right functions.

Main changes

  • Settings persistence: Save/restore preferences in .ai-settings.json
  • Code validation: Pre-test validation with custom validators
  • Retry logic: 10 attempts with exponential backoff via p-retry
  • Improved reporting: Score calculation, unit test totals, validation results
  • Enhanced HTML reports: Score badges, validation section, unit test counts
  • CI updates: Split self-tests and benchmark verification
  • New tests: Added 7 test suites (derived, derived-by, each, effect, inspect, props, snippets)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants