skill-eval: data-model-designer 5.5 -> 7.0/10 (eval report + improved SKILL.md) by jensen-srp · Pull Request #174 · openclaw/skills

jensen-srp · 2026-03-10T09:48:46Z

skill-eval Report: data-model-designer

Original Score: 5.5/10 (Conditional) -> Improved: 7.0/10 (Recommended)

Blind A/B evaluation of data-model-designer by @datadrivenconstruction using skill-eval, an automated evaluation engine that tested 123 ClawHub skills.

Finding

The current SKILL.md is an 11KB Python class definition (ConstructionDataModel). When loaded, the model tries to use these classes, burning +83% time / +113% tokens without producing better schemas than baseline.

Improvement

Rewrote SKILL.md as a behavioral contract:

Removed all Python code (the model already knows how to code)
Added mandatory output sections: Requirements Summary, ER Diagram, SQL DDL, Validation Checklist, Sample Queries
Added SQL style rules and explicit prohibitions
Result: overhead dropped, score jumped from 5.5 to 7.0

Files added

File	Description
`EVAL-REPORT.md`	Original blind A/B evaluation
`EVAL-IMPROVED.md`	Before/after comparison card
`SKILL-improved.md`	Proposed improved SKILL.md
`IMPROVEMENT-LOG.md`	Detailed changelog with rationale

Methodology

Blind A/B: same prompts tested with and without skill loaded
Two-layer assertions: deterministic checks + rubric-based quality scoring
Model: Claude Opus 4
Full methodology

Leaderboard

All 123 evaluated skills: jensen-srp.github.io/skill-eval

Filed by skill-eval v0.4.0

openclaw-barnacle · 2026-03-10T09:48:55Z

Thanks for the pull request! This repository is read-only and is automatically synced from https://clawhub.ai, so we can’t accept changes here. Please make updates on the website instead.

jensen-srp added 4 commits March 10, 2026 17:48

Add eval report for data-model-designer

511bc4b

Add improvement comparison for data-model-designer

e020ff7

Add improved SKILL.md for data-model-designer

2caab66

Add improvement log for data-model-designer

93fdad5

openclaw-barnacle bot closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

skill-eval: data-model-designer 5.5 -> 7.0/10 (eval report + improved SKILL.md)#174

skill-eval: data-model-designer 5.5 -> 7.0/10 (eval report + improved SKILL.md)#174
jensen-srp wants to merge 4 commits intoopenclaw:mainfrom
jensen-srp:skill-eval/data-model-designer

jensen-srp commented Mar 10, 2026

Uh oh!

openclaw-barnacle bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jensen-srp commented Mar 10, 2026

skill-eval Report: data-model-designer

Finding

Improvement

Files added

Methodology

Leaderboard

Uh oh!

openclaw-barnacle bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant