Skip to content

feat: model-agnostic agent and skill templates#10

Open
ahmedibrahim085 wants to merge 116 commits intodisler:mainfrom
ahmedibrahim085:feat/agent-templates
Open

feat: model-agnostic agent and skill templates#10
ahmedibrahim085 wants to merge 116 commits intodisler:mainfrom
ahmedibrahim085:feat/agent-templates

Conversation

@ahmedibrahim085
Copy link

Summary

  • Add 4 agent templates and 1 skill template for nano-agent dispatch
  • Templates define roles (reviewer, researcher, implementer, teammate) independently from models (user customizes model/provider)
  • Based on Anthropic's official agent file spec and best practices from their multi-agent research
  • Reviewed by Z.ai (GLM-5) and LM Studio (qwen3-coder-next) — all critical findings addressed

What's Included

File Purpose
templates/agents/README.md Installation, customization, advanced config, troubleshooting
templates/agents/nano-reviewer.md Code review with external LLM cross-check
templates/agents/nano-researcher.md Codebase research with external LLM analysis
templates/agents/nano-implementer.md Code implementation via external LLM dispatch
templates/agents/nano-teammate.md Peer teammate for agent teams
templates/skills/README.md Skill installation, skills vs agents comparison
templates/skills/nano-dispatch/SKILL.md Quick /nano-dispatch <task> slash command

Design Decisions

  1. Templates, not managed agents — users copy, customize, and own their agents
  2. Model/role separationmodel: inherit in frontmatter, YOUR_MODEL/YOUR_PROVIDER placeholders in dispatch calls
  3. Anthropic-spec compliant — frontmatter fields follow official spec (name, description, tools, model, allowed-tools for skills)
  4. Evidence-based prompts — all templates include evidence rules and verification steps
  5. Full provider table — correct model names verified against constants.py (including date-suffixed Anthropic models)

Review Findings Addressed

Finding Source Resolution
Anthropic model names need date suffixes zai + lms Fixed: full names in README
check_providers() ambiguous syntax zai Fixed: "Use the check_providers tool"
model: sonnet hardcoded in teammate zai + lms Fixed: changed to model: inherit
launch_agent listed but unexplained zai + lms Fixed: removed from teammate tools
Fragile CLI hack in README zai + lms Fixed: removed entirely
No troubleshooting section lms Fixed: added to README
No Advanced Configuration docs lms Fixed: added maxTurns, permissionMode, hooks, etc.
allowed-tools vs tools confusion zai Fixed: documented in skills README
workspace placeholder unclear lms Fixed: workspace="" with comment
Qwen auth unclear zai Fixed: "Requires prior qwen CLI authentication"

Test plan

  • Copy nano-reviewer.md to ~/.claude/agents/ and verify it appears in /agents
  • Customize YOUR_MODEL/YOUR_PROVIDER and verify dispatch works
  • Copy nano-dispatch/ to ~/.claude/skills/ and verify /nano-dispatch appears
  • Verify all model names in README match constants.py AVAILABLE_MODELS
  • Test teammate template in an agent team with TeamCreate

Previously check_api_key() always demanded OPENAI_API_KEY even when
using Ollama or other providers that don't need it. Now it looks up
the required key from PROVIDER_REQUIREMENTS based on the provider.
- Fix Ollama localhost→127.0.0.1 (dual-instance IPv4/IPv6 split)
- Add LM Studio provider (port 1234, OpenAI-compat)
- Add Z.ai provider via LitellmModel (Anthropic-protocol bridge)
- Fix telemetry: unconditionally disable OpenAI tracing for non-OpenAI
- Replace static model whitelist with dynamic API validation for local
  providers (Ollama, LM Studio query their running service at runtime)
- Add "lmstudio" and "zai" to provider Literal types
- Add Z_AI_API_KEY to PROVIDER_REQUIREMENTS
- Add ZAI_BASE_URL and ZAI_AVAILABLE_MODELS constants
- Extend Ollama model list with qwen3-coder, gemma3, magistral
openai-agents[litellm] enables LitellmModel which translates between
OpenAI chat completions format and Anthropic Messages API, allowing
Z.ai's Anthropic-compatible endpoint to work with the OpenAI Agent SDK.
- Add FastAPI + uvicorn dependencies and nano-web entry point
- Create /api/providers endpoint (health check all 5 providers)
- Create /api/models endpoint (catalog across all providers)
- Create /api/run endpoint (execute agent prompts)
- Increase MAX_TOKENS to 16000 for longer agent responses
Single-file HTML dashboard with Tailwind CSS featuring:
- Provider health cards (5 providers, auto-refresh 30s)
- Model catalog table (searchable, sortable, 60+ models)
- Agent playground (provider/model selection, prompt execution)
- Dark theme with indigo/violet accents
- Bug fixes: dropdown population, execution_time path, selection persistence
…es 4-6)

Backend: history tracking with in-memory storage, config API with masked
keys, agent CRUD for ~/.claude/agents/nano-agent-*.md files.
Frontend: generated by Z.ai GLM-4.7, integrated into dashboard.
Adds shell execution via @function_tool (works with all providers),
workspace directory isolation, and upgraded system prompt for coding
agent behavior. Inspired by OpenAI Cookbook's coding agent pattern.
…e-flow

Migration from ~/nano-agent to ~/ai_storage/projects/nano-agent/.
Fork: ahmedibrahim085/nano-agent. Upstream: disler/nano-agent.
… and agent configs

Replace 3 module-level mutable globals with contextvars.ContextVar for
async-safe isolation when multiple agents run concurrently in the MCP
server (_workspace_dir, _last_tool_args, _pending_tool_args).

Fix resolve_path() to use get_workspace() instead of Path.cwd() so
relative paths resolve correctly per-agent. Add async
validate_provider_setup_async() using httpx.AsyncClient to avoid
blocking the event loop with sync requests.get().

Wire async validation into _execute_nano_agent_async(). Restore tracing
state on provider switch (OpenAI=enabled, others=disabled).

Update all 15 agent configs with Planner-Executor framing, workspace
parameter extraction, and cost-optimized relay models. Convert all 8
hooks to absolute paths to prevent CWD-drift bootstrap deadlocks.

Tests: 9 concurrency tests + 3 workspace resolution tests, 0 regressions.
set_tracing_disabled() is a process-wide singleton — toggling it
per-provider causes races when agents with different providers run
concurrently. Disable unconditionally since we don't need OpenAI
telemetry regardless of provider.
Replace 8 hardcoded absolute paths in .claude/settings.json with
$CLAUDE_PROJECT_DIR for portability across machines and users.
…parameter

- Add LaunchAgentRequest Pydantic model to data_types.py with required agent_path field
- Add instructions_override parameter to _execute_nano_agent_async() and _execute_nano_agent()
- When instructions_override is set, it replaces NANO_AGENT_SYSTEM_PROMPT as base instructions
- Default None preserves backward compatibility for prompt_nano_agent()
…and build_layered_prompt

- Create agent_identity.py with two public functions
- read_agent_instructions() validates and reads AGENT.md from agent directory
- build_layered_prompt() assembles 3-layer system prompt (Base + Agent + Project)
- Dedup: skips Project Instructions when agent_path == workspace
- Graceful degradation: workspace AGENT.md read failures are warnings, not errors
- Add launch_agent() async function to nano_agent.py with agent_path parameter
- Reads AGENT.md for identity, builds layered prompt, executes with instructions_override
- Register launch_agent as MCP tool in __main__.py
- Update server instructions to document both tools
- 7 tests for read_agent_instructions (success, whitespace, path errors, unicode, relative)
- 9 tests for build_layered_prompt (layers, order, dedup, empty, read errors)
- Uses tmp_path fixture for all filesystem operations
…n + platform skip

Accepted from GLM-4.7 review of Qwen-Next code:
- read_agent_instructions() now raises ValueError if AGENT.md is empty after stripping
- Added test_empty_agent_md test case
- Added @pytest.mark.skipif for Windows on chmod-based test
Problem: The check_providers MCP tool needs typed request/response models
to ensure consistent serialization across both the happy path (.model_dump())
and the error path (manual dict). Without a shared schema, the two paths
can drift — the error path might include fields the model doesn't define,
or omit fields the client expects.

Approach: Add ProviderHealthStatus (per-provider status with up/down/partial
enum, model list, latency, error) and CheckProvidersResponse (aggregated
result with provider dict, counters, total time). Include an explicit
`error: Optional[str]` field on CheckProvidersResponse so the error path
can return the same shape as the happy path instead of ad-hoc dict keys.
…ctor

Problem: No way to check if providers (OpenAI, Anthropic, Ollama, LM Studio,
Z.ai) are reachable and which models they serve. Additionally, the local
provider connection config (URLs, endpoints, model extractors) was duplicated
4 times across validate_provider_setup, validate_provider_setup_async, and
_check_provider_health for ollama/lmstudio — a maintenance hazard. The
AVAILABLE_MODELS and ZAI_AVAILABLE_MODELS imports were unnecessarily deferred
inside function bodies despite constants.py having zero internal imports.

Approach: Extract LOCAL_PROVIDER_CONFIG as a module-level constant mapping
provider name to (base_url, endpoint, extractor_fn, start_hint) tuple. All 4
call sites now reference this single source of truth. Hoist AVAILABLE_MODELS
and ZAI_AVAILABLE_MODELS imports to top-level, removing 3 deferred imports.
Add _check_provider_health() for per-provider async health checks using httpx,
and check_all_providers_async() that runs all 5 checks concurrently via
asyncio.gather with return_exceptions=True so one failure doesn't abort others.
Problem: The health check engine (check_all_providers_async) exists in
provider_config but is not exposed as an MCP tool. Callers have no way
to invoke it through the MCP protocol. Also, the previous US-002 draft
left a dead CheckProvidersResponse import in nano_agent.py that was never
referenced by name (check_providers returns Dict[str, Any]).

Approach: Add check_providers() async function in nano_agent.py as the MCP
tool wrapper — it delegates to check_all_providers_async(), converts the
Pydantic response to dict via .model_dump(), and handles errors by returning
a dict with the same shape (success=False, error=str). Register it in
__main__.py via mcp.tool()(check_providers). Remove the unused
CheckProvidersResponse import since the tool returns plain dicts for MCP
protocol compatibility.
Problem: The health check feature had zero test coverage. Code review
identified 6 specific gaps: LM Studio empty models and timeout paths,
Ollama timeout path, OpenAI unexpected HTTP status (e.g. 500), the
check_providers error path when the orchestrator throws, and unknown
provider handling. Without these, regressions in error paths would go
undetected.

Approach: 28 async pytest tests organized by provider section. Each test
mocks httpx.AsyncClient (or os.getenv for API keys) to simulate specific
provider states without network calls. Coverage spans: 5 OpenAI states
(up, 401, unreachable, timeout, missing key, 500), 2 Anthropic, 2 Z.ai,
4 Ollama (up, partial, down, empty, timeout), 3 LM Studio (up, down,
empty, timeout), 7 concurrent execution scenarios (all up, mixed counters,
timing verification, one exception, all down, schema validation), 2 MCP
tool integration (happy path, error path), and 1 unknown provider.
Adds launch_agent() MCP tool with identity-aware execution via AGENT.md files.
Combines US-002 (provider health check) with US-010 (launch_agent)
already on main. Resolves import/registration conflicts in __main__.py
and nano_agent.py to include all three MCP tools: prompt_nano_agent,
launch_agent, and check_providers.
The "run_command" name misleads LLM agents into thinking the tool only
handles single commands. Renaming to "bash" signals that chained commands
(&&, ;, |), scripts, and pipelines are supported — matching Claude Code's
Bash tool naming convention.

Update the system prompt tool documentation and rules section to reference
bash() with expanded usage examples including chained commands.
Three problems with the old run_command tool:
1. Name misleads agents — they don't realize they can chain with &&, ;, pipes
2. Output truncated at 8K chars — test suites and build logs are often 10-20K
3. CWD resets every call — agents must use absolute paths everywhere

Rename the function to bash (the @function_tool decorator auto-derives the
tool name from the Python function name). Increase output cap from 8K to 30K
chars with proportional 60/35 head/tail split via named constants.

Add persistent CWD tracking via a ContextVar (_bash_cwd_var) that survives
across bash calls within a session. A shell wrapper appends a unique marker
and pwd after each command, which _parse_cwd_from_output() extracts and
stores. The marker is stripped before returning output to the agent. Exit
codes are preserved via a saved $? variable. set_workspace() resets CWD
tracking when a new agent session begins.
Covers four areas of the run_command → bash migration:
- Constants: TOOL_BASH exists, AVAILABLE_TOOLS updated, system prompt refs bash
- Function: bash in tool list, basic command execution
- Output cap: no truncation under 30K, truncation over 30K, head/tail preserved
- Persistent CWD: defaults to workspace, persists after cd, unchanged after
  failed cd, marker stripped from output, exit code preserved, concurrent task
  isolation via ContextVar, reset on set_workspace

Uses FunctionTool.on_invoke_tool API for integration tests to exercise the
full @function_tool decorator path.
Sync CLAUDE.md and KNOWLEDGE_TRANSFER.md with the renamed bash tool.
Historical git commit references in KNOWLEDGE_TRANSFER.md are left
unchanged since they describe what the original commit messages said.
Narrative release note covering the three problems solved (misleading name,
8K output cap, CWD reset), the approach taken for each, what was NOT changed
and why, live verification results, and full test coverage summary.
Rename run_command → bash, increase output cap 8K → 30K, add persistent
CWD tracking via ContextVar. 19 tests, zero regressions, live-verified
with Qwen3-Coder 30B.
Mark bash tool rename (PR #1) as shipped in Phase 1 table. Update US-002
Provider Health Check to shipped with all acceptance criteria checked.
Refresh tool references from run_command to bash in baseline table and
architecture diagram. Update dependency graph — US-003 now unblocked.
Add git_status, git_commit, git_branch, git_diff tools (8→12 total).
Safety guards block force push, hard reset, git clean, protected branch
deletion, checkout-dot, and alias bypass. 35 tests all passing.
- test_bash_tool.py: 4 new process-group tests (TestBashProcessGroups)
- test_bash_background.py: 21 new tests across 7 classes
- All 25 new tests fail (ImportError) — production code not yet implemented
- 19 existing tests unaffected
- Add TOOL_BASH_BACKGROUND = "bash_background" constant
- Update AVAILABLE_TOOLS from 12 to 13 items
- Update NANO_AGENT_SYSTEM_PROMPT with bash_background docs and tool count
Tier 1 — Fix orphan process bug:
- Add start_new_session=True to bash() subprocess creation
- Replace proc.kill() with _kill_process_tree() using os.killpg
- Add _HAS_PROCESS_GROUPS platform guard for Windows compat

Tier 3 — Add bash_background tool (#13):
- New @function_tool with PID tracking via ContextVar
- Output capture to workspace file (agent reads via read_file)
- Process limit (MAX_BACKGROUND_PROCESSES=5) with dead-proc pruning
- Graceful cleanup: SIGTERM → 3s wait → SIGKILL
- Crash recovery: set_workspace() kills leftover PIDs
- Zombie reaping with os.waitpid(WNOHANG)
- Wrap Runner.run() in try/finally with asyncio.shield
- Import _cleanup_background_processes from nano_agent_tools
- Ensures background processes are killed when agent completes or fails
- asyncio.shield prevents CancelledError from aborting cleanup
- Fix asyncio.run inside event loop in context isolation test
- Fix ProcessLookupError in process groups edge case test
- Update git_tools test assertions from 12 to 13 tools
…5-nano

GPT-5 family models all reject custom temperature via the OpenAI API.
gpt-5 already had supports_temperature=False, but gpt-5-mini and
gpt-5-nano were missing it, causing API errors in 17 tests.
…icing key

- Delete test_track_tool and test_checkpoint (test nonexistent methods)
- Remove track_tool/tool_usage references from test_reset, test_generate_report,
  test_report_to_dict, test_report_format_summary
- Fix output_cost assertion in test_generate_report (reasoning tokens add to cost)
- Fix MODEL_PRICING key from "gpt-oss" to "ollama" in test_oss_models_free
- Fix Ollama base_url: localhost to 127.0.0.1 (matches production)
- Fix error message casing: "Ollama" to "ollama" (matches interpolation)
- Fix tracing assertion: set_tracing_disabled(True) not False (global disable)
Also add skipif for ANTHROPIC_API_KEY so it skips cleanly in CI.
…ipif

- Add skipif(OPENAI_API_KEY) to TestExecuteNanoAgent and TestIntegration classes
- Add skipif to test_prompt_nano_agent_basic and test_prompt_nano_agent_default_parameters
- Fix invalid provider assertion for 6-provider Pydantic Literal error
- Fix available_models assertion for dict structure (keyed by provider)
- Add assertions for provider count (>=3) and tool count (==13)
- Fix turns_used assertion (can be None when result has no messages)
- Fix timestamp assertion (async path uses 'model' key, not 'timestamp')
Matches production code (provider_config.py) which uses 127.0.0.1
to avoid IPv4/IPv6 dual-instance bug on macOS.
- test_model_capabilities: gpt-5-mini temperature is now None (not 0.2)
  because supports_temperature=False was added in the production fix
- test_nano_agent_integration: turns_used can be None when result lacks
  messages attribute — check key exists instead of comparing to int
Production fix: add supports_temperature=False to gpt-5-mini and gpt-5-nano.
Test fixes: remove dead API references, fix stale assertions, add skipif guards,
fix URL/casing/tracing mismatches, add missing asyncio decorator.

Result: 0 failed, 464 passed, 2 skipped (missing ANTHROPIC_API_KEY).
…silience

A1: Restrict CORS to localhost origins (server.py)
  - Changed allow_origins from ["*"] to localhost:8484 + 127.0.0.1:8484
  - Prevents malicious websites from calling the dashboard API

A2: Add workspace focus boundary to file tools (nano_agent_tools.py)
  - Added resolve().relative_to(workspace) check to read_file_raw,
    write_file_raw, and edit_file_raw — matching existing pattern
    in search_files_raw and _raw_run_tests
  - Agent discipline: keeps agents focused on project directory

A3: Make background process cleanup resilient to cancellation
  - Added _force_kill_remaining() synchronous fallback
  - _cleanup_background_processes catches asyncio.CancelledError
    and force-kills remaining processes to prevent orphans

Tests: 11 new in test_security_fixes.py, 21 existing tests updated
  with workspace fixtures. 475 passed, 2 skipped, 0 failed.
Ship starter templates that define roles independently from models.
Users copy, customize model/provider, and install to ~/.claude/agents/
or .claude/agents/. Based on Anthropic's official agent file spec and
best practices, reviewed by Z.ai (GLM-5) and LM Studio (qwen3-coder-next).

Templates: nano-reviewer, nano-researcher, nano-implementer, nano-teammate
Skill: nano-dispatch (slash command for quick one-off dispatch)
- Add memory: user to reviewer and teammate templates
- Add mcpServers: nano-agent to teammate frontmatter
- Add --agents CLI flag for session-only testing
- Document multi-agent coexistence (no overwriting)
- Add spawn-time context guidance for agent teams
- Document all 3 MCP tools (including launch_agent)
- Add background agent note with MCP caveat
- State nano-agent ownership boundary explicitly
- Note that templates can be renamed (name field, not filename)
Templates are now bundled inside the Python package at
src/nano_agent/templates/ for runtime discoverability via
importlib.resources. The repo-root templates/ directory is
replaced with a README pointing to the new location.

This follows PyPA best practice: data files that the code
needs at runtime must live inside the package boundary.
Identity templates provide persistent worker personas for the
launch_agent MCP tool. Each defines a role (general-coder,
code-reviewer, tdd-engineer, backend-expert) without duplicating
the 13-tool listing already in the base system prompt.

Unlike Claude Code agent templates (YAML frontmatter), these are
plain Markdown files consumed by the agent_identity module.
Decision guide helps users and Claude Code choose between MCP
tools, teammates, subagents, background agents, and skills.
Recipes provide step-by-step workflows for the 5 scenarios
we have actually tested: MCP dispatch, teammate collaboration,
background bash, launch_agent identity, and skill dispatch.
New module template_resources.py provides load_template() for
reading bundled template files via importlib.resources, and
register_template_resources() for wiring them as MCP resources.

Tests cover: all template categories, missing files, invalid
categories, path traversal, registry completeness, manifest
structure, custom URI scheme, MCP resource registration.

16 tests, all passing. Zero regressions on 491 existing tests.
Templates are now discoverable via MCP resources protocol.
Claude Code sessions can list and read all templates without
needing the nano-agent repo cloned.

Also updates the MCP server description with an abbreviated
"When to Use What" guide and resource discovery instructions.
…e guides

Rewrote all 5 recipes to use correct MCP tool call syntax instead of
fake Python imports, removed non-existent parameters (context={},
working_dir, .dispatch()), and made all examples provider-agnostic
with YOUR_MODEL/YOUR_PROVIDER placeholders.

Created 3 new guides: installation.md, multi-instance.md, and
team-patterns.md. Updated when-to-use-what.md with goal-oriented
lookup table. Registered 3 new entries in TEMPLATE_REGISTRY (23 total).
… providers

Added shared conftest.py with Ollama auto-detection (127.0.0.1, not
localhost, to avoid IPv4/IPv6 dual-instance bug). All integration tests
now use the smallest available local model via ollama_model fixture.

Fixed GPT-5 production tests to skip gracefully on 404 and corrected
invalid reasoning_effort="minimal" to "none". Cleaned up unused imports.

Result: 486 passed, 0 failed, 14 skipped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant