Skip to content

Commit 8df09c9

Browse files
committed
docs: update README with v1.1.0 scoring criteria
1 parent d071790 commit 8df09c9

1 file changed

Lines changed: 44 additions & 9 deletions

File tree

README.md

Lines changed: 44 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -198,28 +198,40 @@ SkillScore evaluates skills across **8 weighted categories**:
198198

199199
| Category | Weight | Description |
200200
|----------|--------|-------------|
201-
| **Structure** | 15% | SKILL.md exists, clear name/description, follows conventions |
201+
| **Structure** | 15% | SKILL.md exists, clear name/description, file organization, artifact output spec |
202202
| **Clarity** | 20% | Specific actionable instructions, no ambiguity, logical order |
203-
| **Safety** | 20% | No destructive commands without confirmation, respects permissions |
203+
| **Safety** | 20% | No destructive commands, respects permissions, network containment |
204204
| **Dependencies** | 10% | Lists required tools/APIs, install instructions, env vars |
205205
| **Error Handling** | 10% | Failure instructions, fallbacks, no silent failures |
206-
| **Scope** | 10% | Single responsibility, accurate description, specific triggers |
207-
| **Documentation** | 10% | Usage examples, expected I/O, troubleshooting |
206+
| **Scope** | 10% | Single responsibility, routing quality, negative examples |
207+
| **Documentation** | 10% | Usage examples, embedded templates, expected I/O |
208208
| **Portability** | 5% | Cross-platform, no hardcoded paths, relative paths |
209209

210210
### Scoring Methodology
211211

212212
Each category is scored from 0-10 points based on specific criteria:
213213

214-
- **Structure**: Checks for SKILL.md existence, clear naming, proper organization
214+
- **Structure**: Checks for SKILL.md existence, clear naming, proper organization, and whether outputs/artifacts are defined
215215
- **Clarity**: Analyzes instruction specificity, ambiguity, logical flow
216-
- **Safety**: Scans for destructive commands, security risks, permission issues
216+
- **Safety**: Scans for destructive commands, security risks, permission issues, and network containment (does the skill scope network access when using HTTP/APIs?)
217217
- **Dependencies**: Validates tool listings, installation instructions, environment setup
218218
- **Error Handling**: Reviews error scenarios, fallback strategies, validation
219-
- **Scope**: Assesses single responsibility, trigger clarity, conflict potential
220-
- **Documentation**: Evaluates examples, I/O documentation, troubleshooting guides
219+
- **Scope**: Assesses single responsibility, trigger clarity, conflict potential, **negative routing examples** ("don't use when..."), and **routing quality** (concrete signals vs vague descriptions)
220+
- **Documentation**: Evaluates examples, I/O documentation, troubleshooting guides, and **embedded templates/worked examples** with expected output
221221
- **Portability**: Checks cross-platform compatibility, path handling, limitations
222222

223+
### v1.1.0: Production-Validated Checks
224+
225+
Five new sub-criteria added in v1.1.0, inspired by [OpenAI's Skills + Shell + Compaction blog](https://developers.openai.com/blog/skills-shell-tips) and production data from Glean:
226+
227+
| Check | Category | Points | Why It Matters |
228+
|-------|----------|--------|----------------|
229+
| **Negative routing examples** | Scope | 2 | Skills that say when NOT to use them trigger ~20% more accurately (Glean data) |
230+
| **Routing quality** | Scope | 1 | Descriptions with concrete tool names, I/O, and "use when" patterns route better than marketing copy |
231+
| **Embedded templates** | Documentation | 2 | Real output templates inside the skill drove the biggest quality + latency gains in production |
232+
| **Network containment** | Safety | 1 | Skills combining tools + open network access are a data exfiltration risk without scoping |
233+
| **Artifact output spec** | Structure | 1 | Skills that define where outputs go create clean review boundaries |
234+
223235
### Grade Scale
224236

225237
| Grade | Score Range | Description |
@@ -261,6 +273,16 @@ my-skill/
261273

262274
Brief description of what this skill does and when to use it.
263275

276+
## When to Use
277+
278+
Use this skill when you need to [specific task] with [specific tools/inputs].
279+
280+
## When NOT to Use
281+
282+
Don't use this skill when:
283+
- The task is [alternative scenario] — use [other skill] instead
284+
- You need [different capability]
285+
264286
## Dependencies
265287

266288
- Tool 1: Installation instructions
@@ -273,6 +295,10 @@ Brief description of what this skill does and when to use it.
273295
2. Specific commands to run
274296
3. Expected outputs
275297

298+
## Output
299+
300+
Results are written to `./output/` as JSON files.
301+
276302
## Error Handling
277303

278304
- Common issues and solutions
@@ -281,6 +307,15 @@ Brief description of what this skill does and when to use it.
281307

282308
## Examples
283309

310+
### Example Output
311+
312+
```json
313+
{
314+
"status": "success",
315+
"result": "Example of what the skill produces"
316+
}
317+
```
318+
284319
```bash
285320
# Working example
286321
./scripts/main.py --input "test data"
@@ -438,7 +473,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
438473
- Inspired by the need for quality assessment in AI agent skills
439474
- Built for the OpenClaw and Claude Code communities
440475
- Thanks to all contributors and skill creators
441-
- Scoring methodology informed by software engineering best practices
476+
- Scoring methodology informed by software engineering best practices and [OpenAI's production skill patterns](https://developers.openai.com/blog/skills-shell-tips)
442477

443478
## 📊 Example Scores
444479

0 commit comments

Comments
 (0)