Spec proposal: corteza.skill/v1 versioned bundle format

## TL;DR

Promote the existing ad-hoc SKILL.md / SKILL.json handling to a versioned, documented format (`corteza.skill/v1`). New fields drive dispatch enforcement, invocation gating, side-effect disclosure, and trust provenance. Auto-derivation from R packages (`skillify_package()`) becomes a documented pipeline.

## What `corteza.skill/v1` *is*

A written spec — a markdown document in the corteza repo (proposed: `inst/spec/corteza.skill-v1.md`) defining the required/optional frontmatter fields, body conventions, directory layouts, and the versioning + migration rules. Bundles opt in via `skill_format: corteza.skill/v1` in their frontmatter. Consumers (corteza, future R-side agents) read the spec to know how to interpret a bundle. Same pattern as MIME types: an identifier in the file pointing to an externally documented spec.

## Current state vs proposed

| Capability | Today in corteza | `corteza.skill/v1` adds |
|---|---|---|
| Bundle format | ad-hoc YAML frontmatter + optional `SKILL.json` sidecar (`parse_skill_md`, `parse_skill_json`) | versioned `skill_format:` header field |
| Directory layout | flat (`path/SKILL.md`) and nested (`path/<name>/SKILL.md`) both supported | (already answered) |
| Parsing | `parse_skill_md`, `parse_skill_json`, hand-rolled `parse_yaml_simple` | extended parser (or YAML dep) for nested/list-typed fields |
| Discovery roots | `corteza_data_path('skills')`, `<cwd>/.corteza/skills`, `config\$skill_paths`, `load_skill_packages(config)` | `.libPaths()`-style provenance with `shadowed_by`; Anthropic-compat `~/.claude/skills` + `./.claude/skills` |
| Lifecycle | `skill_install`, `skill_remove`, `skill_list_installed`, `skill_test` | (no change) |
| Context injection | `.skill_docs` registry, loaded at `ensure_skills()` + `load_skill_docs()` on session_setup + serve startup | (mechanism unchanged; new fields gate what enters the registry) |
| Model invocation gating | none (present-or-absent) | `model_invocation: auto / user-only / disabled` |
| Tool allow/deny | none (R-handler registry is flat) | `tools.allow:` / `tools.deny:`, enforced at dispatch |
| Side-effects declaration | none | `side_effects: [none / filesystem / network / session / external]` |
| Trust provenance | none | `trust: local / project / user / package / managed` |
| Library deps | implicit (bundle code does what it does) | `library_depends:` + `attach: false` default |
| Auto-derivation | none (authored only) | `skillify_package(pkg)` + `generated_from:` stamp |
| Format versioning | none | `corteza.skill/v1` spec doc, migration path |
| Interaction shape | none | `interaction: returning / conversational` |

## New fields, one-line rationale each

- `skill_format: corteza.skill/v1` — version identifier; lets future v2 readers refuse v1 bundles gracefully and vice versa.
- `model_invocation: auto / user-only / disabled` — bundles too destructive to auto-fire (`autoresearch`, `deploy`) but worth keeping `/`-accessible get `user-only`.
- `side_effects: [...]` — drives 'this skill writes to your vault, OK?' prompts and dry-run modes.
- `tools.allow:` / `tools.deny:` — enforced at dispatch. A bundle scoped to `[read, grep]` cannot call `bash` even if the model asks. Defense in depth.
- `trust: local / project / user / package / managed` — provenance for UI affordance. `package`-trust skills inherit CRAN's review pipeline; `local` ones prompt before first invocation per session.
- `library_depends:` + `attach: false` default — declare which R packages the bundle needs, namespace-load by default (`requireNamespace()` + `pkg::fun()`), opt in to attachment.
- `generated_from: { package, version, source_hash }` — provenance stamp for bundles produced by `skillify_package()`. Lets users diff before replacing when upstream package versions change.
- `arguments.schema:` — JSON Schema for `/skill arg1 arg2` parsing.
- `interaction: returning | conversational` — UI affordance hint. `returning` skills run to a final result and exit (autoresearch, audit, lint); `conversational` skills drive multi-turn dialogue (a tutoring skill, a code-review pairing skill). Drives whether the host treats invocation as "fire, wait, surface result" vs. "hand the turn over."

## New behaviors

- `skillify_package(pkg, summarize = FALSE)` — derive a bundle from `saber::pkg_exports()` + `saber::pkg_help()` + vignette TOC. Structured output by default. `summarize = TRUE` requires `Suggests: llm.api` and uses an LLM pass to produce prose summaries.
- Discovery provenance — `available_skills()` returns `path`, `source` (project / user / package / system), `priority`, `shadowed_by`. Name collisions warn, never silent override.

## Open questions

1. **`tools.allow:` name resolution.** Bare names against the live corteza registry? Fully qualified (`pensar::ingest_url`)? Synthetic LLM-facing (`pensar_ingest_url`)? Pin one before v1 ships.
2. **`generated_from:` artifact storage.** Derived bundles aren't authored under `inst/`. Cache them at `tools::R_user_dir('corteza', 'cache')/skills/<pkg>/<version>/`?
3. **Anthropic compatibility round-trip.** `model_invocation: user-only` collapses to Claude Code's `disable-model-invocation: true` (boolean). Document the lossy mapping or design a compatibility shim?
4. **`attach: false` vs per-session `auto_library: true` override.** The override silently undoes the safety claim. Either remove the override or downgrade 'safe by default' to 'policy not mechanism.'
5. **Internal rename: `register_skill_from_fn` → `register_tool_from_fn`, `ensure_skills` → `ensure_tools`.** Bundle-side exports (`skill_install`, `skill_remove`, `skill_list_installed`, `skill_test`) keep the `skill_*` prefix correctly. Worth a `.Deprecated()` cycle on the two tool-side exports to free 'skill' semantically for bundles only.
6. **YAML parser scope.** Current `parse_yaml_simple()` handles `key: value` plus JSON-in-`metadata`. Several v1 fields (`tools.allow:`, `library_depends:`, `arguments.schema:`) are nested or list-typed. Extend the parser, or take a dep on `yaml`?
7. **`interaction:` value naming.** `returning | conversational` is plain English; alternatives are `oneshot | conversational`, `batch | conversational`. Pin before v1.

## Worked example: pensar autoresearch

Pensar ships \`inst/skills/pensar/autoresearch/SKILL.md\`. Under v1 its frontmatter becomes:

\`\`\`yaml
skill_format: corteza.skill/v1
name: autoresearch
description: Bounded autonomous research loop into a pensar vault.
model_invocation: user-only
side_effects: [filesystem, network]
library_depends: [pensar, llm.api]
attach: false
tools:
  allow: [web_search, pensar_ingest_url, pensar_search_pages,
          pensar_related_pages, pensar_write_page, finalize]
trust: package
interaction: returning
\`\`\`

The same body serves Claude Code (via `~/.claude/skills/pensar/` symlink), corteza chat (via `load_skill_packages()`), and a forthcoming `pensar::autoresearch()` R callable that reads its own SKILL.md via `corteza::parse_skill_md()` (proposed export) and drives `llm.api::agent()` with the named tools as registered wrappers. One source of procedural truth, three consumers.

## Out of scope for this issue

- Saber-side context-injection of skill bodies into non-corteza sessions (Claude Code, Codex). Saber already does this generally via `agent_context()`; teaching it about skills is a separate, smaller change once v1 stabilizes.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec proposal: corteza.skill/v1 versioned bundle format #100

TL;DR

What `corteza.skill/v1` is

Current state vs proposed

New fields, one-line rationale each

New behaviors

Open questions

Worked example: pensar autoresearch

Out of scope for this issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Capability	Today in corteza	`corteza.skill/v1` adds
Bundle format	ad-hoc YAML frontmatter + optional `SKILL.json` sidecar (`parse_skill_md`, `parse_skill_json`)	versioned `skill_format:` header field
Directory layout	flat (`path/SKILL.md`) and nested (`path/<name>/SKILL.md`) both supported	(already answered)
Parsing	`parse_skill_md`, `parse_skill_json`, hand-rolled `parse_yaml_simple`	extended parser (or YAML dep) for nested/list-typed fields
Discovery roots	`corteza_data_path('skills')`, `<cwd>/.corteza/skills`, `config\$skill_paths`, `load_skill_packages(config)`	`.libPaths()`-style provenance with `shadowed_by`; Anthropic-compat `~/.claude/skills` + `./.claude/skills`
Lifecycle	`skill_install`, `skill_remove`, `skill_list_installed`, `skill_test`	(no change)
Context injection	`.skill_docs` registry, loaded at `ensure_skills()` + `load_skill_docs()` on session_setup + serve startup	(mechanism unchanged; new fields gate what enters the registry)
Model invocation gating	none (present-or-absent)	`model_invocation: auto / user-only / disabled`
Tool allow/deny	none (R-handler registry is flat)	`tools.allow:` / `tools.deny:`, enforced at dispatch
Side-effects declaration	none	`side_effects: [none / filesystem / network / session / external]`
Trust provenance	none	`trust: local / project / user / package / managed`
Library deps	implicit (bundle code does what it does)	`library_depends:` + `attach: false` default
Auto-derivation	none (authored only)	`skillify_package(pkg)` + `generated_from:` stamp
Format versioning	none	`corteza.skill/v1` spec doc, migration path
Interaction shape	none	`interaction: returning / conversational`

Spec proposal: corteza.skill/v1 versioned bundle format #100

Description

TL;DR

What corteza.skill/v1 is

Current state vs proposed

New fields, one-line rationale each

New behaviors

Open questions

Worked example: pensar autoresearch

Out of scope for this issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

What `corteza.skill/v1` is