feat: model-agnostic agent and skill templates#10
Open
ahmedibrahim085 wants to merge 116 commits intodisler:mainfrom
Open
feat: model-agnostic agent and skill templates#10ahmedibrahim085 wants to merge 116 commits intodisler:mainfrom
ahmedibrahim085 wants to merge 116 commits intodisler:mainfrom
Conversation
Previously check_api_key() always demanded OPENAI_API_KEY even when using Ollama or other providers that don't need it. Now it looks up the required key from PROVIDER_REQUIREMENTS based on the provider.
- Fix Ollama localhost→127.0.0.1 (dual-instance IPv4/IPv6 split) - Add LM Studio provider (port 1234, OpenAI-compat) - Add Z.ai provider via LitellmModel (Anthropic-protocol bridge) - Fix telemetry: unconditionally disable OpenAI tracing for non-OpenAI - Replace static model whitelist with dynamic API validation for local providers (Ollama, LM Studio query their running service at runtime)
- Add "lmstudio" and "zai" to provider Literal types - Add Z_AI_API_KEY to PROVIDER_REQUIREMENTS - Add ZAI_BASE_URL and ZAI_AVAILABLE_MODELS constants - Extend Ollama model list with qwen3-coder, gemma3, magistral
openai-agents[litellm] enables LitellmModel which translates between OpenAI chat completions format and Anthropic Messages API, allowing Z.ai's Anthropic-compatible endpoint to work with the OpenAI Agent SDK.
- Add FastAPI + uvicorn dependencies and nano-web entry point - Create /api/providers endpoint (health check all 5 providers) - Create /api/models endpoint (catalog across all providers) - Create /api/run endpoint (execute agent prompts) - Increase MAX_TOKENS to 16000 for longer agent responses
Single-file HTML dashboard with Tailwind CSS featuring: - Provider health cards (5 providers, auto-refresh 30s) - Model catalog table (searchable, sortable, 60+ models) - Agent playground (provider/model selection, prompt execution) - Dark theme with indigo/violet accents - Bug fixes: dropdown population, execution_time path, selection persistence
…es 4-6) Backend: history tracking with in-memory storage, config API with masked keys, agent CRUD for ~/.claude/agents/nano-agent-*.md files. Frontend: generated by Z.ai GLM-4.7, integrated into dashboard.
Adds shell execution via @function_tool (works with all providers), workspace directory isolation, and upgraded system prompt for coding agent behavior. Inspired by OpenAI Cookbook's coding agent pattern.
…e-flow Migration from ~/nano-agent to ~/ai_storage/projects/nano-agent/. Fork: ahmedibrahim085/nano-agent. Upstream: disler/nano-agent.
… and agent configs Replace 3 module-level mutable globals with contextvars.ContextVar for async-safe isolation when multiple agents run concurrently in the MCP server (_workspace_dir, _last_tool_args, _pending_tool_args). Fix resolve_path() to use get_workspace() instead of Path.cwd() so relative paths resolve correctly per-agent. Add async validate_provider_setup_async() using httpx.AsyncClient to avoid blocking the event loop with sync requests.get(). Wire async validation into _execute_nano_agent_async(). Restore tracing state on provider switch (OpenAI=enabled, others=disabled). Update all 15 agent configs with Planner-Executor framing, workspace parameter extraction, and cost-optimized relay models. Convert all 8 hooks to absolute paths to prevent CWD-drift bootstrap deadlocks. Tests: 9 concurrency tests + 3 workspace resolution tests, 0 regressions.
set_tracing_disabled() is a process-wide singleton — toggling it per-provider causes races when agents with different providers run concurrently. Disable unconditionally since we don't need OpenAI telemetry regardless of provider.
Replace 8 hardcoded absolute paths in .claude/settings.json with $CLAUDE_PROJECT_DIR for portability across machines and users.
…parameter - Add LaunchAgentRequest Pydantic model to data_types.py with required agent_path field - Add instructions_override parameter to _execute_nano_agent_async() and _execute_nano_agent() - When instructions_override is set, it replaces NANO_AGENT_SYSTEM_PROMPT as base instructions - Default None preserves backward compatibility for prompt_nano_agent()
…and build_layered_prompt - Create agent_identity.py with two public functions - read_agent_instructions() validates and reads AGENT.md from agent directory - build_layered_prompt() assembles 3-layer system prompt (Base + Agent + Project) - Dedup: skips Project Instructions when agent_path == workspace - Graceful degradation: workspace AGENT.md read failures are warnings, not errors
- Add launch_agent() async function to nano_agent.py with agent_path parameter - Reads AGENT.md for identity, builds layered prompt, executes with instructions_override - Register launch_agent as MCP tool in __main__.py - Update server instructions to document both tools
- 7 tests for read_agent_instructions (success, whitespace, path errors, unicode, relative) - 9 tests for build_layered_prompt (layers, order, dedup, empty, read errors) - Uses tmp_path fixture for all filesystem operations
…n + platform skip Accepted from GLM-4.7 review of Qwen-Next code: - read_agent_instructions() now raises ValueError if AGENT.md is empty after stripping - Added test_empty_agent_md test case - Added @pytest.mark.skipif for Windows on chmod-based test
Problem: The check_providers MCP tool needs typed request/response models to ensure consistent serialization across both the happy path (.model_dump()) and the error path (manual dict). Without a shared schema, the two paths can drift — the error path might include fields the model doesn't define, or omit fields the client expects. Approach: Add ProviderHealthStatus (per-provider status with up/down/partial enum, model list, latency, error) and CheckProvidersResponse (aggregated result with provider dict, counters, total time). Include an explicit `error: Optional[str]` field on CheckProvidersResponse so the error path can return the same shape as the happy path instead of ad-hoc dict keys.
…ctor Problem: No way to check if providers (OpenAI, Anthropic, Ollama, LM Studio, Z.ai) are reachable and which models they serve. Additionally, the local provider connection config (URLs, endpoints, model extractors) was duplicated 4 times across validate_provider_setup, validate_provider_setup_async, and _check_provider_health for ollama/lmstudio — a maintenance hazard. The AVAILABLE_MODELS and ZAI_AVAILABLE_MODELS imports were unnecessarily deferred inside function bodies despite constants.py having zero internal imports. Approach: Extract LOCAL_PROVIDER_CONFIG as a module-level constant mapping provider name to (base_url, endpoint, extractor_fn, start_hint) tuple. All 4 call sites now reference this single source of truth. Hoist AVAILABLE_MODELS and ZAI_AVAILABLE_MODELS imports to top-level, removing 3 deferred imports. Add _check_provider_health() for per-provider async health checks using httpx, and check_all_providers_async() that runs all 5 checks concurrently via asyncio.gather with return_exceptions=True so one failure doesn't abort others.
Problem: The health check engine (check_all_providers_async) exists in provider_config but is not exposed as an MCP tool. Callers have no way to invoke it through the MCP protocol. Also, the previous US-002 draft left a dead CheckProvidersResponse import in nano_agent.py that was never referenced by name (check_providers returns Dict[str, Any]). Approach: Add check_providers() async function in nano_agent.py as the MCP tool wrapper — it delegates to check_all_providers_async(), converts the Pydantic response to dict via .model_dump(), and handles errors by returning a dict with the same shape (success=False, error=str). Register it in __main__.py via mcp.tool()(check_providers). Remove the unused CheckProvidersResponse import since the tool returns plain dicts for MCP protocol compatibility.
Problem: The health check feature had zero test coverage. Code review identified 6 specific gaps: LM Studio empty models and timeout paths, Ollama timeout path, OpenAI unexpected HTTP status (e.g. 500), the check_providers error path when the orchestrator throws, and unknown provider handling. Without these, regressions in error paths would go undetected. Approach: 28 async pytest tests organized by provider section. Each test mocks httpx.AsyncClient (or os.getenv for API keys) to simulate specific provider states without network calls. Coverage spans: 5 OpenAI states (up, 401, unreachable, timeout, missing key, 500), 2 Anthropic, 2 Z.ai, 4 Ollama (up, partial, down, empty, timeout), 3 LM Studio (up, down, empty, timeout), 7 concurrent execution scenarios (all up, mixed counters, timing verification, one exception, all down, schema validation), 2 MCP tool integration (happy path, error path), and 1 unknown provider.
Adds launch_agent() MCP tool with identity-aware execution via AGENT.md files.
Combines US-002 (provider health check) with US-010 (launch_agent) already on main. Resolves import/registration conflicts in __main__.py and nano_agent.py to include all three MCP tools: prompt_nano_agent, launch_agent, and check_providers.
The "run_command" name misleads LLM agents into thinking the tool only handles single commands. Renaming to "bash" signals that chained commands (&&, ;, |), scripts, and pipelines are supported — matching Claude Code's Bash tool naming convention. Update the system prompt tool documentation and rules section to reference bash() with expanded usage examples including chained commands.
Three problems with the old run_command tool: 1. Name misleads agents — they don't realize they can chain with &&, ;, pipes 2. Output truncated at 8K chars — test suites and build logs are often 10-20K 3. CWD resets every call — agents must use absolute paths everywhere Rename the function to bash (the @function_tool decorator auto-derives the tool name from the Python function name). Increase output cap from 8K to 30K chars with proportional 60/35 head/tail split via named constants. Add persistent CWD tracking via a ContextVar (_bash_cwd_var) that survives across bash calls within a session. A shell wrapper appends a unique marker and pwd after each command, which _parse_cwd_from_output() extracts and stores. The marker is stripped before returning output to the agent. Exit codes are preserved via a saved $? variable. set_workspace() resets CWD tracking when a new agent session begins.
Covers four areas of the run_command → bash migration: - Constants: TOOL_BASH exists, AVAILABLE_TOOLS updated, system prompt refs bash - Function: bash in tool list, basic command execution - Output cap: no truncation under 30K, truncation over 30K, head/tail preserved - Persistent CWD: defaults to workspace, persists after cd, unchanged after failed cd, marker stripped from output, exit code preserved, concurrent task isolation via ContextVar, reset on set_workspace Uses FunctionTool.on_invoke_tool API for integration tests to exercise the full @function_tool decorator path.
Sync CLAUDE.md and KNOWLEDGE_TRANSFER.md with the renamed bash tool. Historical git commit references in KNOWLEDGE_TRANSFER.md are left unchanged since they describe what the original commit messages said.
Narrative release note covering the three problems solved (misleading name, 8K output cap, CWD reset), the approach taken for each, what was NOT changed and why, live verification results, and full test coverage summary.
Rename run_command → bash, increase output cap 8K → 30K, add persistent CWD tracking via ContextVar. 19 tests, zero regressions, live-verified with Qwen3-Coder 30B.
Mark bash tool rename (PR #1) as shipped in Phase 1 table. Update US-002 Provider Health Check to shipped with all acceptance criteria checked. Refresh tool references from run_command to bash in baseline table and architecture diagram. Update dependency graph — US-003 now unblocked.
Add git_status, git_commit, git_branch, git_diff tools (8→12 total). Safety guards block force push, hard reset, git clean, protected branch deletion, checkout-dot, and alias bypass. 35 tests all passing.
- test_bash_tool.py: 4 new process-group tests (TestBashProcessGroups) - test_bash_background.py: 21 new tests across 7 classes - All 25 new tests fail (ImportError) — production code not yet implemented - 19 existing tests unaffected
- Add TOOL_BASH_BACKGROUND = "bash_background" constant - Update AVAILABLE_TOOLS from 12 to 13 items - Update NANO_AGENT_SYSTEM_PROMPT with bash_background docs and tool count
Tier 1 — Fix orphan process bug: - Add start_new_session=True to bash() subprocess creation - Replace proc.kill() with _kill_process_tree() using os.killpg - Add _HAS_PROCESS_GROUPS platform guard for Windows compat Tier 3 — Add bash_background tool (#13): - New @function_tool with PID tracking via ContextVar - Output capture to workspace file (agent reads via read_file) - Process limit (MAX_BACKGROUND_PROCESSES=5) with dead-proc pruning - Graceful cleanup: SIGTERM → 3s wait → SIGKILL - Crash recovery: set_workspace() kills leftover PIDs - Zombie reaping with os.waitpid(WNOHANG)
- Wrap Runner.run() in try/finally with asyncio.shield - Import _cleanup_background_processes from nano_agent_tools - Ensures background processes are killed when agent completes or fails - asyncio.shield prevents CancelledError from aborting cleanup
- Fix asyncio.run inside event loop in context isolation test - Fix ProcessLookupError in process groups edge case test - Update git_tools test assertions from 12 to 13 tools
…5-nano GPT-5 family models all reject custom temperature via the OpenAI API. gpt-5 already had supports_temperature=False, but gpt-5-mini and gpt-5-nano were missing it, causing API errors in 17 tests.
…icing key - Delete test_track_tool and test_checkpoint (test nonexistent methods) - Remove track_tool/tool_usage references from test_reset, test_generate_report, test_report_to_dict, test_report_format_summary - Fix output_cost assertion in test_generate_report (reasoning tokens add to cost) - Fix MODEL_PRICING key from "gpt-oss" to "ollama" in test_oss_models_free
- Fix Ollama base_url: localhost to 127.0.0.1 (matches production) - Fix error message casing: "Ollama" to "ollama" (matches interpolation) - Fix tracing assertion: set_tracing_disabled(True) not False (global disable)
Also add skipif for ANTHROPIC_API_KEY so it skips cleanly in CI.
…ipif - Add skipif(OPENAI_API_KEY) to TestExecuteNanoAgent and TestIntegration classes - Add skipif to test_prompt_nano_agent_basic and test_prompt_nano_agent_default_parameters - Fix invalid provider assertion for 6-provider Pydantic Literal error - Fix available_models assertion for dict structure (keyed by provider) - Add assertions for provider count (>=3) and tool count (==13) - Fix turns_used assertion (can be None when result has no messages) - Fix timestamp assertion (async path uses 'model' key, not 'timestamp')
Matches production code (provider_config.py) which uses 127.0.0.1 to avoid IPv4/IPv6 dual-instance bug on macOS.
- test_model_capabilities: gpt-5-mini temperature is now None (not 0.2) because supports_temperature=False was added in the production fix - test_nano_agent_integration: turns_used can be None when result lacks messages attribute — check key exists instead of comparing to int
Production fix: add supports_temperature=False to gpt-5-mini and gpt-5-nano. Test fixes: remove dead API references, fix stale assertions, add skipif guards, fix URL/casing/tracing mismatches, add missing asyncio decorator. Result: 0 failed, 464 passed, 2 skipped (missing ANTHROPIC_API_KEY).
…silience
A1: Restrict CORS to localhost origins (server.py)
- Changed allow_origins from ["*"] to localhost:8484 + 127.0.0.1:8484
- Prevents malicious websites from calling the dashboard API
A2: Add workspace focus boundary to file tools (nano_agent_tools.py)
- Added resolve().relative_to(workspace) check to read_file_raw,
write_file_raw, and edit_file_raw — matching existing pattern
in search_files_raw and _raw_run_tests
- Agent discipline: keeps agents focused on project directory
A3: Make background process cleanup resilient to cancellation
- Added _force_kill_remaining() synchronous fallback
- _cleanup_background_processes catches asyncio.CancelledError
and force-kills remaining processes to prevent orphans
Tests: 11 new in test_security_fixes.py, 21 existing tests updated
with workspace fixtures. 475 passed, 2 skipped, 0 failed.
Ship starter templates that define roles independently from models. Users copy, customize model/provider, and install to ~/.claude/agents/ or .claude/agents/. Based on Anthropic's official agent file spec and best practices, reviewed by Z.ai (GLM-5) and LM Studio (qwen3-coder-next). Templates: nano-reviewer, nano-researcher, nano-implementer, nano-teammate Skill: nano-dispatch (slash command for quick one-off dispatch)
- Add memory: user to reviewer and teammate templates - Add mcpServers: nano-agent to teammate frontmatter - Add --agents CLI flag for session-only testing - Document multi-agent coexistence (no overwriting) - Add spawn-time context guidance for agent teams - Document all 3 MCP tools (including launch_agent) - Add background agent note with MCP caveat - State nano-agent ownership boundary explicitly - Note that templates can be renamed (name field, not filename)
Templates are now bundled inside the Python package at src/nano_agent/templates/ for runtime discoverability via importlib.resources. The repo-root templates/ directory is replaced with a README pointing to the new location. This follows PyPA best practice: data files that the code needs at runtime must live inside the package boundary.
Identity templates provide persistent worker personas for the launch_agent MCP tool. Each defines a role (general-coder, code-reviewer, tdd-engineer, backend-expert) without duplicating the 13-tool listing already in the base system prompt. Unlike Claude Code agent templates (YAML frontmatter), these are plain Markdown files consumed by the agent_identity module.
Decision guide helps users and Claude Code choose between MCP tools, teammates, subagents, background agents, and skills. Recipes provide step-by-step workflows for the 5 scenarios we have actually tested: MCP dispatch, teammate collaboration, background bash, launch_agent identity, and skill dispatch.
New module template_resources.py provides load_template() for reading bundled template files via importlib.resources, and register_template_resources() for wiring them as MCP resources. Tests cover: all template categories, missing files, invalid categories, path traversal, registry completeness, manifest structure, custom URI scheme, MCP resource registration. 16 tests, all passing. Zero regressions on 491 existing tests.
Templates are now discoverable via MCP resources protocol. Claude Code sessions can list and read all templates without needing the nano-agent repo cloned. Also updates the MCP server description with an abbreviated "When to Use What" guide and resource discovery instructions.
…e guides
Rewrote all 5 recipes to use correct MCP tool call syntax instead of
fake Python imports, removed non-existent parameters (context={},
working_dir, .dispatch()), and made all examples provider-agnostic
with YOUR_MODEL/YOUR_PROVIDER placeholders.
Created 3 new guides: installation.md, multi-instance.md, and
team-patterns.md. Updated when-to-use-what.md with goal-oriented
lookup table. Registered 3 new entries in TEMPLATE_REGISTRY (23 total).
… providers Added shared conftest.py with Ollama auto-detection (127.0.0.1, not localhost, to avoid IPv4/IPv6 dual-instance bug). All integration tests now use the smallest available local model via ollama_model fixture. Fixed GPT-5 production tests to skip gracefully on 404 and corrected invalid reasoning_effort="minimal" to "none". Cleaned up unused imports. Result: 486 passed, 0 failed, 14 skipped.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
What's Included
templates/agents/README.mdtemplates/agents/nano-reviewer.mdtemplates/agents/nano-researcher.mdtemplates/agents/nano-implementer.mdtemplates/agents/nano-teammate.mdtemplates/skills/README.mdtemplates/skills/nano-dispatch/SKILL.md/nano-dispatch <task>slash commandDesign Decisions
model: inheritin frontmatter,YOUR_MODEL/YOUR_PROVIDERplaceholders in dispatch callsname,description,tools,model,allowed-toolsfor skills)constants.py(including date-suffixed Anthropic models)Review Findings Addressed
check_providers()ambiguous syntaxmodel: sonnethardcoded in teammatemodel: inheritlaunch_agentlisted but unexplainedallowed-toolsvstoolsconfusionworkspaceplaceholder unclearworkspace=""with commentTest plan
nano-reviewer.mdto~/.claude/agents/and verify it appears in/agentsnano-dispatch/to~/.claude/skills/and verify/nano-dispatchappearsconstants.pyAVAILABLE_MODELSTeamCreate