diff --git a/Dockerfile b/Dockerfile index dcd088b..6a27ce3 100644 --- a/Dockerfile +++ b/Dockerfile @@ -45,6 +45,10 @@ RUN uv pip install --system -r /app/requirements.txt # Copy application code COPY . /app/ +# Pre-create /workspaces so named-volume mounts inherit correct permissions +# (without this, Docker creates it as root read-only on fresh deployments) +RUN mkdir -p /workspaces && chmod 777 /workspaces + EXPOSE 8003 ENV PORT=8003 \ diff --git a/docs/deployment.md b/docs/deployment.md new file mode 100644 index 0000000..3d42c3d --- /dev/null +++ b/docs/deployment.md @@ -0,0 +1,179 @@ +# Deployment Guide + +This guide covers deploying SWE-AF on a new server, including prerequisites, known issues, and quick-start instructions. + +## Prerequisites + +### Software + +| Requirement | Minimum Version | Notes | +|---|---|---| +| Docker | 20.10+ | With BuildKit support | +| Docker Compose | 2.0+ | V2 plugin (`docker compose`, not `docker-compose`) | +| Git | 2.30+ | For cloning the repository | + +### Environment Variables + +Copy `.env.example` to `.env` and configure at least one authentication method: + +```bash +cp .env.example .env +``` + +**Required (one of):** + +| Variable | Purpose | +|---|---| +| `ANTHROPIC_API_KEY` | Anthropic API key for Claude models | +| `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code subscription token (uses Pro/Max credits) | + +**For open-source models (alternative to Claude):** + +| Variable | Purpose | +|---|---| +| `OPENROUTER_API_KEY` | OpenRouter API key (200+ models) | +| `OPENAI_API_KEY` | OpenAI API key | +| `GOOGLE_API_KEY` | Google Gemini API key | + +**Optional:** + +| Variable | Purpose | Default | +|---|---|---| +| `GH_TOKEN` | GitHub PAT with `repo` scope for draft PRs | *(none)* | +| `AGENTFIELD_SERVER` | Control plane URL | `http://control-plane:8080` (Docker) | +| `NODE_ID` | Agent node identifier | `swe-planner` | +| `PORT` | Agent listen port | `8003` | + +### Package Versions + +| Package | Minimum Version | Notes | +|---|---|---| +| `agentfield` | 0.1.67+ | Python SDK (includes opencode v1.4+ fix) | +| `claude-agent-sdk` | 0.1.20+ | Claude runtime | +| opencode CLI | 1.4+ | Only if using `open_code` runtime (see Known Issues) | + +## Quick Start + +### Full Stack (control plane + agent) + +```bash +git clone https://github.com/Agent-Field/SWE-AF +cd SWE-AF +cp .env.example .env # fill in API keys +docker compose up -d +``` + +This starts: +- **control-plane** on `:8080` — AgentField orchestration server +- **swe-agent** on `:8003` — SWE-AF full pipeline (`swe-planner` node) +- **swe-fast** on `:8004` — SWE-AF fast mode (`swe-fast` node) + +### Agent Only (connect to existing control plane) + +If you already have an AgentField control plane running: + +```bash +git clone https://github.com/Agent-Field/SWE-AF +cd SWE-AF +cp .env.example .env # fill in API keys + +# Set AGENTFIELD_SERVER in .env to your control plane URL +docker compose -f docker-compose.local.yml up -d +``` + +### Verify Deployment + +```bash +# Check agent health +curl http://localhost:8003/health + +# Check control plane (full stack only) +curl http://localhost:8080/api/v1/health +``` + +## Known Issues and Fixes + +### `/workspaces` read-only filesystem error + +**Symptom:** +``` +[Errno 30] Read-only file system: '/workspaces' +``` + +**Root cause:** The `/workspaces` directory was not pre-created in the Docker image. When Docker mounts a named volume, it creates the directory as root with restrictive permissions. + +**Fix:** This is fixed in the current Dockerfile. If you're using an older image, rebuild: +```bash +docker compose build --no-cache +``` + +The fix adds `RUN mkdir -p /workspaces && chmod 777 /workspaces` to the Dockerfile before the volume mount point. + +**Ref:** [#46](https://github.com/Agent-Field/SWE-AF/issues/46) + +### `Product manager failed to produce a valid PRD` with `open_code` runtime + +**Symptom:** Builds using the `open_code` runtime fail at the Product Manager step with a generic error. The agent completes in a few seconds (too fast for real work). + +**Root cause:** opencode CLI v1.4+ changed its CLI interface: +- `-p` (prompt) flag was removed — prompt is now a positional arg to the `run` subcommand +- `-c` now means `--continue` (resume session), not project directory + +**Fix:** Upgrade the `agentfield` Python SDK to a version that includes the opencode v1.4+ compatibility fix: +```bash +pip install --upgrade agentfield +``` + +**Ref:** [#45](https://github.com/Agent-Field/SWE-AF/issues/45) + +### Fatal API errors silently retry + +**Symptom:** Build with exhausted credits or invalid API key retries multiple times before failing with a misleading error (e.g., "Product manager failed to produce a valid PRD"). + +**Root cause:** Non-retryable API errors (credit exhaustion, invalid key) were not distinguished from transient errors, causing all retry layers to fire. + +**Fix:** This is fixed in the current version. Upgrade to get `FatalHarnessError` detection that immediately aborts on: +- Credit balance too low +- Invalid API key +- Authentication failed +- Account disabled +- Quota exceeded + +**Ref:** [#49](https://github.com/Agent-Field/SWE-AF/issues/49) + +### Parallel builds cross-contamination + +**Symptom:** Running two builds simultaneously for the same repository causes agents to receive input from the wrong build. + +**Root cause:** Both builds cloned to the same workspace path (`/workspaces/`), sharing git state and artifacts. + +**Fix:** This is fixed in the current version. Each build now gets an isolated workspace: `/workspaces/-`. + +**Ref:** [#43](https://github.com/Agent-Field/SWE-AF/issues/43) + +## Scaling + +### Multiple concurrent builds + +Each build automatically gets an isolated workspace. To run multiple builds concurrently: + +```bash +# Scale the agent service +docker compose up --scale swe-agent=3 -d +``` + +### Resource considerations + +Each build clones the target repository and runs multiple LLM calls. Plan for: +- **Disk:** ~500MB per concurrent build (repo clone + artifacts) +- **Memory:** ~512MB per agent container +- **Network:** LLM API calls are the bottleneck, not compute + +## Troubleshooting + +| Symptom | Check | +|---|---| +| Agent not registering with control plane | Verify `AGENTFIELD_SERVER` is reachable from the container | +| Builds timing out | Check API key validity and credit balance | +| `git clone` failures | Verify `GH_TOKEN` has `repo` scope for private repositories | +| Health check failing | Check container logs: `docker compose logs swe-agent` | diff --git a/requirements-docker.txt b/requirements-docker.txt index d5cd7a2..700dcd7 100644 --- a/requirements-docker.txt +++ b/requirements-docker.txt @@ -2,6 +2,6 @@ # # Same runtime dependencies as requirements.txt. -agentfield>=0.1.41 +agentfield>=0.1.67 pydantic>=2.0 claude-agent-sdk==0.1.20 diff --git a/requirements.txt b/requirements.txt index d2cbc81..afce4f4 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,6 +2,6 @@ # # Install: python -m pip install -r requirements.txt -agentfield>=0.1.9 +agentfield>=0.1.67 pydantic>=2.0 claude-agent-sdk==0.1.20 diff --git a/swe_af/app.py b/swe_af/app.py index 1f7693d..56cbd75 100644 --- a/swe_af/app.py +++ b/swe_af/app.py @@ -195,18 +195,30 @@ async def build( if repo_url: cfg.repo_url = repo_url - # Auto-derive repo_path from repo_url when not specified + # Generate build_id BEFORE workspace setup so each concurrent build + # gets a fully isolated workspace (repo clone, artifacts, worktrees). + # Fixes cross-contamination when parallel builds target the same repo. + # Ref: https://github.com/Agent-Field/SWE-AF/issues/43 + build_id = uuid.uuid4().hex[:8] + + # Auto-derive repo_path from repo_url when not specified. + # Each build gets its own clone directory scoped by build_id to prevent + # concurrent builds from sharing git state, artifacts, or worktrees. if cfg.repo_url and not repo_path: - repo_path = f"/workspaces/{_repo_name_from_url(cfg.repo_url)}" + repo_name = _repo_name_from_url(cfg.repo_url) + repo_path = f"/workspaces/{repo_name}-{build_id}" # Multi-repo: derive repo_path from primary repo; _clone_repos handles cloning later if not repo_path and len(cfg.repos) > 1: primary = next((r for r in cfg.repos if r.role == "primary"), cfg.repos[0]) - repo_path = f"/workspaces/{_repo_name_from_url(primary.repo_url)}" + repo_name = _repo_name_from_url(primary.repo_url) + repo_path = f"/workspaces/{repo_name}-{build_id}" if not repo_path: raise ValueError("Either repo_path or repo_url must be provided") + app.note(f"Build starting (build_id={build_id})", tags=["build", "start"]) + # Clone if repo_url is set and target doesn't exist yet git_dir = os.path.join(repo_path, ".git") if cfg.repo_url and not os.path.exists(git_dir): @@ -222,8 +234,8 @@ async def build( app.note(f"Clone failed (exit {clone_result.returncode}): {err}", tags=["build", "clone", "error"]) raise RuntimeError(f"git clone failed (exit {clone_result.returncode}): {err}") elif cfg.repo_url and os.path.exists(git_dir): - # Repo already cloned by a prior build — reset to remote default branch - # so git_init creates the integration branch from a clean baseline. + # Repo already exists at this build-scoped path (unlikely but handle gracefully). + # Reset to remote default branch for a clean baseline. default_branch = cfg.github_pr_base or "main" app.note( f"Repo already exists at {repo_path} — resetting to origin/{default_branch}", @@ -290,12 +302,6 @@ async def build( # Resolve runtime + flat model config once for this build. resolved = cfg.resolved_models() - # Unique ID for this build — namespaces git branches/worktrees to prevent - # collisions when multiple builds run concurrently on the same repository. - build_id = uuid.uuid4().hex[:8] - - app.note(f"Build starting (build_id={build_id})", tags=["build", "start"]) - # Compute absolute artifacts directory path for logging abs_artifacts_dir = os.path.join(os.path.abspath(repo_path), artifacts_dir) diff --git a/swe_af/execution/coding_loop.py b/swe_af/execution/coding_loop.py index 0e1e334..9790e9e 100644 --- a/swe_af/execution/coding_loop.py +++ b/swe_af/execution/coding_loop.py @@ -22,6 +22,7 @@ from typing import Callable +from swe_af.execution.fatal_error import FatalHarnessError from swe_af.execution.schemas import ( DAGState, ExecutionConfig, @@ -325,6 +326,8 @@ async def _run_default_path( timeout=timeout, label=f"review:{issue_name}:default", ) + except FatalHarnessError: + raise except Exception as e: if note_fn: note_fn( @@ -437,6 +440,8 @@ async def _run_flagged_path( tags=["coding_loop", "review_error", issue_name], ) review_result = {"approved": True, "blocking": False, "summary": f"Review unavailable: {review_result}"} + except FatalHarnessError: + raise except Exception as e: if note_fn: note_fn( @@ -479,6 +484,8 @@ async def _run_flagged_path( timeout=timeout, label=f"synthesizer:{issue_name}:iter{iteration}", ) + except FatalHarnessError: + raise except Exception as e: if note_fn: note_fn( @@ -625,6 +632,8 @@ async def run_coding_loop( timeout=timeout, label=f"coder:{issue_name}:iter{iteration}", ) + except FatalHarnessError: + raise except Exception as e: if note_fn: note_fn( diff --git a/swe_af/execution/dag_executor.py b/swe_af/execution/dag_executor.py index bcb6600..4a8a5d5 100644 --- a/swe_af/execution/dag_executor.py +++ b/swe_af/execution/dag_executor.py @@ -11,6 +11,7 @@ from swe_af.execution.dag_utils import apply_replan, find_downstream from swe_af.execution.envelope import unwrap_call_result +from swe_af.execution.fatal_error import FatalHarnessError from swe_af.execution.schemas import ( AdvisorAction, DAGState, @@ -576,6 +577,8 @@ async def _cleanup_single_repo( f"cleaned={result.get('cleaned', [])}", tags=["execution", "worktree_cleanup", "warning"], ) + except FatalHarnessError: + raise except Exception as e: if note_fn: note_fn( @@ -868,6 +871,8 @@ async def _execute_single_issue( timeout=config.agent_timeout_seconds, label=f"issue_advisor:{issue_name}:{advisor_round + 1}", ) + except FatalHarnessError: + raise except Exception as e: if note_fn: note_fn( @@ -1064,6 +1069,8 @@ async def _run_execute_fn( attempts=attempt, ) + except FatalHarnessError: + raise except Exception as e: last_error = str(e) last_context = traceback.format_exc() @@ -1095,6 +1102,8 @@ async def _run_execute_fn( "retry_diagnosis": advice.get("diagnosis", ""), } continue + except FatalHarnessError: + raise except Exception: continue elif attempt <= config.max_retries_per_issue: diff --git a/swe_af/execution/envelope.py b/swe_af/execution/envelope.py index c3f393d..1264af9 100644 --- a/swe_af/execution/envelope.py +++ b/swe_af/execution/envelope.py @@ -12,6 +12,8 @@ from __future__ import annotations +from swe_af.execution.fatal_error import FatalHarnessError, is_fatal_error + # Keys present in the execution envelope returned by _build_execute_response. _ENVELOPE_KEYS = frozenset({ "execution_id", "run_id", "node_id", "type", "target", @@ -51,6 +53,8 @@ def unwrap_call_result(result, label: str = "call"): status = str(result.get("status", "")).lower() if status in ("failed", "error", "cancelled", "timeout"): err = result.get("error_message") or result.get("error") or "unknown" + if is_fatal_error(str(err)): + raise FatalHarnessError(str(err)) raise RuntimeError(f"{label} failed (status={status}): {err}") inner = result.get("result") diff --git a/swe_af/execution/fatal_error.py b/swe_af/execution/fatal_error.py new file mode 100644 index 0000000..3b29735 --- /dev/null +++ b/swe_af/execution/fatal_error.py @@ -0,0 +1,82 @@ +"""Fatal API error detection for non-retryable harness failures. + +When the underlying API returns a fatal error (billing exhaustion, invalid +credentials, disabled account), retrying is pointless and wastes time. This +module provides: + +- ``FatalHarnessError`` — a distinct exception type that short-circuits all + retry layers. +- ``check_fatal_harness_error()`` — inspects a HarnessResult's error_message + and raises ``FatalHarnessError`` immediately on match. + +Ref: https://github.com/Agent-Field/SWE-AF/issues/49 +""" + +from __future__ import annotations + +import re + +# Patterns that indicate a non-retryable API failure. +# Matched case-insensitively against error_message strings. +_FATAL_PATTERNS: tuple[re.Pattern[str], ...] = tuple( + re.compile(p, re.IGNORECASE) + for p in ( + r"credit balance is too low", + r"insufficient.{0,20}credits?", + r"billing.{0,20}(expired|inactive|suspended)", + r"invalid.{0,10}api.?key", + r"invalid.{0,10}x-api-key", + r"(your )?api key is not valid", + r"authentication failed", + r"account has been disabled", + r"account.{0,10}is disabled", + r"unauthorized", + r"quota.{0,20}exceeded", + ) +) + + +class FatalHarnessError(RuntimeError): + """Raised when the harness encounters a non-retryable API error. + + This exception is designed to propagate through all retry layers + (schema retries, SDK execution retries, pipeline retries) without + being caught by generic ``except Exception`` handlers that would + otherwise silently retry. + """ + + def __init__(self, message: str) -> None: + super().__init__(f"Fatal API error (non-retryable): {message}") + self.original_message = message + + +def is_fatal_error(error_message: str) -> bool: + """Return True if *error_message* matches a known fatal API error pattern.""" + if not error_message: + return False + return any(p.search(error_message) for p in _FATAL_PATTERNS) + + +def check_fatal_harness_error(result) -> None: + """Inspect a HarnessResult and raise ``FatalHarnessError`` if fatal. + + Should be called immediately after ``router.harness()`` returns, + *before* any ``result.parsed is None`` check, so the real error + message surfaces instead of a misleading generic one. + + Parameters + ---------- + result: + A ``HarnessResult`` (or any object with ``is_error`` and + ``error_message`` attributes). + + Raises + ------ + FatalHarnessError + If the result indicates a non-retryable API failure. + """ + if not getattr(result, "is_error", False): + return + msg = getattr(result, "error_message", "") or "" + if is_fatal_error(msg): + raise FatalHarnessError(msg) diff --git a/swe_af/reasoners/execution_agents.py b/swe_af/reasoners/execution_agents.py index c9f5430..867e93a 100644 --- a/swe_af/reasoners/execution_agents.py +++ b/swe_af/reasoners/execution_agents.py @@ -10,6 +10,7 @@ from pydantic import BaseModel +from swe_af.execution.fatal_error import FatalHarnessError, check_fatal_harness_error from swe_af.execution.schemas import ( DEFAULT_AGENT_MAX_TURNS, AdvisorAction, @@ -163,6 +164,7 @@ async def run_retry_advisor( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Retry advisor: should_retry={result.parsed.should_retry}, " @@ -170,6 +172,8 @@ async def run_retry_advisor( tags=["retry_advisor", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Retry advisor agent failed: {e}", @@ -243,12 +247,15 @@ async def run_issue_advisor( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Issue advisor decision: {result.parsed.action.value} — {result.parsed.summary}", tags=["issue_advisor", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Issue advisor agent failed: {e}", @@ -322,6 +329,7 @@ async def run_replanner( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) # Log raw response for debugging (even on parse failure) if log_dir: raw_log = os.path.join( @@ -349,6 +357,8 @@ async def run_replanner( "Output ONLY valid JSON conforming to the ReplanDecision schema.\n\n" + task_prompt ) + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Replanner agent failed (attempt {attempt + 1}): {e}", @@ -433,12 +443,15 @@ class IssueWriterOutput(BaseModel): max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Issue writer complete for {issue_name}: {result.parsed.issue_file_path}", tags=["issue_writer", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Issue writer failed for {issue_name}: {e}", @@ -497,12 +510,15 @@ async def run_verifier( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Verifier complete: passed={result.parsed.passed}, summary={result.parsed.summary}", tags=["verifier", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Verifier agent failed: {e}", @@ -579,6 +595,7 @@ async def run_git_init( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Git init complete: mode={result.parsed.mode}, " @@ -586,6 +603,8 @@ async def run_git_init( tags=["git_init", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Git init agent failed: {e}", @@ -652,12 +671,15 @@ class WorkspaceSetupResult(BaseModel): max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Workspace setup complete: {len(result.parsed.workspaces)} worktrees created", tags=["workspace_setup", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Workspace setup agent failed: {e}", @@ -714,6 +736,7 @@ async def run_merger( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Merger complete: merged={result.parsed.merged_branches}, " @@ -722,6 +745,8 @@ async def run_merger( tags=["merger", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Merger agent failed: {e}", @@ -787,6 +812,7 @@ async def run_integration_tester( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Integration tester complete: passed={result.parsed.passed}, " @@ -794,6 +820,8 @@ async def run_integration_tester( tags=["integration_tester", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Integration tester agent failed: {e}", @@ -853,12 +881,15 @@ class WorkspaceCleanupResult(BaseModel): max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Workspace cleanup complete: {len(result.parsed.cleaned)} cleaned", tags=["workspace_cleanup", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Workspace cleanup agent failed: {e}", @@ -927,6 +958,7 @@ async def run_coder( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Coder complete: {issue_name}, " @@ -937,6 +969,8 @@ async def run_coder( out = result.parsed.model_dump() out["iteration_id"] = iteration_id return out + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Coder agent failed: {issue_name}: {e}", @@ -1001,6 +1035,7 @@ async def run_qa( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"QA complete: {issue_name}, passed={result.parsed.passed}", @@ -1009,6 +1044,8 @@ async def run_qa( out = result.parsed.model_dump() out["iteration_id"] = iteration_id return out + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"QA agent failed: {issue_name}: {e}", @@ -1078,6 +1115,7 @@ async def run_code_reviewer( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Code reviewer complete: {issue_name}, " @@ -1088,6 +1126,8 @@ async def run_code_reviewer( out = result.parsed.model_dump() out["iteration_id"] = iteration_id return out + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Code reviewer agent failed: {issue_name}: {e}", @@ -1155,6 +1195,8 @@ async def run_qa_synthesizer( out = result.parsed.model_dump() out["iteration_id"] = iteration_id return out + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"QA synthesizer agent failed: {e}", @@ -1256,6 +1298,7 @@ class FixGeneratorOutput(BaseModel): max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Fix generator complete: {len(result.parsed.fix_issues)} fix issues, " @@ -1263,6 +1306,8 @@ class FixGeneratorOutput(BaseModel): tags=["fix_generator", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Fix generator agent failed: {e}", @@ -1320,6 +1365,7 @@ async def run_repo_finalize( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"Repo finalize complete: {len(result.parsed.files_removed)} files removed, " @@ -1327,6 +1373,8 @@ async def run_repo_finalize( tags=["repo_finalize", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"Repo finalize agent failed: {e}", @@ -1391,12 +1439,15 @@ async def run_github_pr( max_turns=DEFAULT_AGENT_MAX_TURNS, permission_mode=permission_mode or None, ) + check_fatal_harness_error(result) if result.parsed is not None: router.note( f"GitHub PR complete: {result.parsed.pr_url}", tags=["github_pr", "complete"], ) return result.parsed.model_dump() + except FatalHarnessError: + raise # Non-retryable — propagate immediately except Exception as e: router.note( f"GitHub PR agent failed: {e}", diff --git a/swe_af/reasoners/pipeline.py b/swe_af/reasoners/pipeline.py index 902adac..a38007f 100644 --- a/swe_af/reasoners/pipeline.py +++ b/swe_af/reasoners/pipeline.py @@ -14,6 +14,7 @@ from pydantic import BaseModel +from swe_af.execution.fatal_error import check_fatal_harness_error from swe_af.execution.schemas import DEFAULT_AGENT_MAX_TURNS from swe_af.reasoners.schemas import ( Architecture, @@ -202,6 +203,7 @@ async def run_product_manager( system_prompt=system_prompt, cwd=repo_path, ) + check_fatal_harness_error(result) if result.parsed is None: raise RuntimeError("Product manager failed to produce a valid PRD") @@ -261,6 +263,7 @@ async def run_architect( system_prompt=system_prompt, cwd=repo_path, ) + check_fatal_harness_error(result) if result.parsed is None: raise RuntimeError("Architect failed to produce a valid architecture") @@ -315,6 +318,7 @@ async def run_tech_lead( system_prompt=system_prompt, cwd=repo_path, ) + check_fatal_harness_error(result) if result.parsed is None: raise RuntimeError("Tech lead failed to produce a valid review") @@ -391,6 +395,7 @@ class SprintPlanOutput(BaseModel): system_prompt=system_prompt, cwd=repo_path, ) + check_fatal_harness_error(result) if result.parsed is None: raise RuntimeError("Sprint planner failed to produce valid issues") diff --git a/tests/test_build_isolation.py b/tests/test_build_isolation.py new file mode 100644 index 0000000..f7e8659 --- /dev/null +++ b/tests/test_build_isolation.py @@ -0,0 +1,105 @@ +"""Tests for parallel build workspace isolation. + +Validates that concurrent builds for the same repository get isolated +workspace directories, preventing cross-contamination of git state, +artifacts, and worktrees. + +Ref: https://github.com/Agent-Field/SWE-AF/issues/43 +""" + +from __future__ import annotations + +import re +from pathlib import Path + +import pytest + +APP_SOURCE = ( + Path(__file__).resolve().parent.parent / "swe_af" / "app.py" +).read_text() + + +class TestBuildWorkspaceIsolation: + """Issue #43: Parallel builds must not share workspace paths.""" + + def test_auto_derived_repo_path_includes_build_id(self) -> None: + """When repo_url is provided without repo_path, the derived path + must include the build_id to isolate concurrent builds.""" + assert re.search( + r'repo_path\s*=\s*f"/workspaces/\{repo_name\}-\{build_id\}"', + APP_SOURCE, + ), ( + "Auto-derived repo_path must include build_id suffix to isolate " + "concurrent builds (see issue #43)" + ) + + def test_build_id_generated_before_clone(self) -> None: + """build_id must be generated before the clone/workspace setup + so it can be used to scope the workspace path.""" + build_id_match = re.search(r'build_id = uuid\.uuid4\(\)', APP_SOURCE) + clone_match = re.search(r'git_dir = os\.path\.join\(repo_path', APP_SOURCE) + + assert build_id_match is not None, "build_id generation not found" + assert clone_match is not None, "clone logic not found" + assert build_id_match.start() < clone_match.start(), ( + "build_id must be generated BEFORE clone/workspace setup " + "to be available for path scoping (see issue #43)" + ) + + def test_multi_repo_path_includes_build_id(self) -> None: + """Multi-repo derived path must also include build_id.""" + multi_repo_start = APP_SOURCE.find("Multi-repo: derive") + assert multi_repo_start > 0, "Multi-repo section not found" + multi_repo_section = APP_SOURCE[multi_repo_start:multi_repo_start + 500] + assert re.search( + r'repo_path\s*=\s*f"/workspaces/\{repo_name\}-\{build_id\}"', + multi_repo_section, + ), ( + "Multi-repo derived repo_path must include build_id suffix (see issue #43)" + ) + + def test_two_builds_same_repo_get_different_paths(self) -> None: + """Simulate that two builds for the same repo_url produce different paths.""" + import uuid + from swe_af.execution.schemas import _derive_repo_name + + repo_url = "https://github.com/example/my-repo.git" + repo_name = _derive_repo_name(repo_url) + + build_id_a = uuid.uuid4().hex[:8] + build_id_b = uuid.uuid4().hex[:8] + + path_a = f"/workspaces/{repo_name}-{build_id_a}" + path_b = f"/workspaces/{repo_name}-{build_id_b}" + + assert path_a != path_b, ( + "Two builds for the same repo must produce different workspace paths" + ) + assert repo_name in path_a + assert repo_name in path_b + assert build_id_a in path_a + assert build_id_b in path_b + + def test_artifacts_dir_isolated_by_build_scoped_path(self) -> None: + """artifacts_dir is relative to repo_path, so build-scoped repo_path + means artifacts are also isolated.""" + import uuid + import os + from swe_af.execution.schemas import _derive_repo_name + + repo_url = "https://github.com/example/my-repo.git" + repo_name = _derive_repo_name(repo_url) + artifacts_dir = ".artifacts" + + build_id_a = uuid.uuid4().hex[:8] + build_id_b = uuid.uuid4().hex[:8] + + repo_path_a = f"/workspaces/{repo_name}-{build_id_a}" + repo_path_b = f"/workspaces/{repo_name}-{build_id_b}" + + artifacts_a = os.path.join(repo_path_a, artifacts_dir) + artifacts_b = os.path.join(repo_path_b, artifacts_dir) + + assert artifacts_a != artifacts_b, ( + "Artifacts dirs must differ between concurrent builds" + ) diff --git a/tests/test_deployment_docs.py b/tests/test_deployment_docs.py new file mode 100644 index 0000000..8e0f42a --- /dev/null +++ b/tests/test_deployment_docs.py @@ -0,0 +1,67 @@ +"""Tests for deployment documentation completeness. + +Validates that docs/deployment.md exists and contains required sections +covering prerequisites, known issues, and quick-start instructions. + +Ref: https://github.com/Agent-Field/SWE-AF/issues/48 +""" + +from __future__ import annotations + +from pathlib import Path + +import pytest + +DEPLOYMENT_DOC = Path(__file__).resolve().parent.parent / "docs" / "deployment.md" + + +@pytest.fixture(scope="module") +def doc_content() -> str: + assert DEPLOYMENT_DOC.exists(), ( + f"docs/deployment.md not found at {DEPLOYMENT_DOC}. " + "This file is required for deployment guidance (see issue #48)." + ) + return DEPLOYMENT_DOC.read_text() + + +class TestDeploymentDocStructure: + """Issue #48: Deployment docs must cover key sections.""" + + def test_prerequisites_section(self, doc_content: str) -> None: + assert "## Prerequisites" in doc_content + + def test_environment_variables_documented(self, doc_content: str) -> None: + for var in ("ANTHROPIC_API_KEY", "GH_TOKEN", "AGENTFIELD_SERVER"): + assert var in doc_content, f"Missing env var documentation: {var}" + + def test_quick_start_section(self, doc_content: str) -> None: + assert "## Quick Start" in doc_content + + def test_quick_start_has_docker_compose(self, doc_content: str) -> None: + assert "docker compose" in doc_content + + def test_quick_start_has_env_setup(self, doc_content: str) -> None: + assert ".env.example" in doc_content + + def test_known_issues_section(self, doc_content: str) -> None: + assert "## Known Issues" in doc_content + + def test_known_issue_workspaces(self, doc_content: str) -> None: + """Issue #46 must be documented.""" + assert "/workspaces" in doc_content + assert "read-only" in doc_content.lower() or "Read-only" in doc_content + + def test_known_issue_opencode(self, doc_content: str) -> None: + """Issue #45 must be documented.""" + assert "opencode" in doc_content.lower() or "open_code" in doc_content + + def test_known_issue_fatal_errors(self, doc_content: str) -> None: + """Issue #49 must be documented.""" + assert "fatal" in doc_content.lower() or "credit" in doc_content.lower() + + def test_known_issue_parallel_builds(self, doc_content: str) -> None: + """Issue #43 must be documented.""" + assert "parallel" in doc_content.lower() or "concurrent" in doc_content.lower() + + def test_troubleshooting_section(self, doc_content: str) -> None: + assert "## Troubleshooting" in doc_content or "## Scaling" in doc_content diff --git a/tests/test_dockerfile.py b/tests/test_dockerfile.py new file mode 100644 index 0000000..00292a3 --- /dev/null +++ b/tests/test_dockerfile.py @@ -0,0 +1,56 @@ +"""Tests for Dockerfile correctness. + +Validates that the Dockerfile contains required directives for correct +container behavior — particularly around directory pre-creation that +prevents read-only filesystem errors with named volume mounts. + +Ref: https://github.com/Agent-Field/SWE-AF/issues/46 +""" + +from __future__ import annotations + +import re +from pathlib import Path + +import pytest + +DOCKERFILE = Path(__file__).resolve().parent.parent / "Dockerfile" + + +@pytest.fixture(scope="module") +def dockerfile_content() -> str: + return DOCKERFILE.read_text() + + +class TestWorkspacesDirectory: + """Issue #46: /workspaces must be pre-created with write permissions.""" + + def test_mkdir_workspaces_exists(self, dockerfile_content: str) -> None: + """Dockerfile must create /workspaces before any volume mount.""" + assert re.search( + r"mkdir\s+-p\s+/workspaces", dockerfile_content + ), ( + "Dockerfile must contain 'mkdir -p /workspaces' to pre-create the " + "directory before named volume mounts (see issue #46)" + ) + + def test_chmod_workspaces(self, dockerfile_content: str) -> None: + """Dockerfile must set write permissions on /workspaces.""" + assert re.search( + r"chmod\s+\d*7\d*\s+/workspaces", dockerfile_content + ), ( + "Dockerfile must chmod /workspaces with world-writable permissions " + "so the running process can write to it (see issue #46)" + ) + + def test_workspaces_created_before_expose(self, dockerfile_content: str) -> None: + """mkdir /workspaces must appear before EXPOSE (i.e. in the build stage).""" + mkdir_match = re.search(r"mkdir\s+-p\s+/workspaces", dockerfile_content) + expose_match = re.search(r"^EXPOSE\s+", dockerfile_content, re.MULTILINE) + assert mkdir_match is not None and expose_match is not None, ( + "Both 'mkdir -p /workspaces' and 'EXPOSE' must exist in Dockerfile" + ) + assert mkdir_match.start() < expose_match.start(), ( + "/workspaces must be created before EXPOSE to ensure it's part of " + "the image layer before any volume mount" + ) diff --git a/tests/test_fatal_error.py b/tests/test_fatal_error.py new file mode 100644 index 0000000..355782a --- /dev/null +++ b/tests/test_fatal_error.py @@ -0,0 +1,185 @@ +"""Tests for fatal API error detection and propagation. + +Validates that non-retryable API errors (credit exhaustion, invalid API key, +disabled accounts) are detected and raised as FatalHarnessError, preventing +silent retries across all retry layers. + +Ref: https://github.com/Agent-Field/SWE-AF/issues/49 +""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import Optional + +import pytest + +from swe_af.execution.fatal_error import ( + FatalHarnessError, + check_fatal_harness_error, + is_fatal_error, +) + + +# --------------------------------------------------------------------------- +# is_fatal_error — pattern matching +# --------------------------------------------------------------------------- + + +class TestIsFatalError: + """Test that known fatal error patterns are detected.""" + + @pytest.mark.parametrize( + "message", + [ + "Credit balance is too low", + "credit balance is too low to run this request", + "Your API key is not valid", + "Invalid API key provided", + "invalid x-api-key", + "Authentication failed", + "authentication failed: check your credentials", + "Account has been disabled", + "account is disabled", + "Unauthorized", + "unauthorized access", + "Insufficient credits remaining", + "insufficient credits", + "billing expired", + "billing inactive", + "billing suspended", + "Quota exceeded for this model", + "quota has been exceeded", + ], + ) + def test_fatal_patterns_detected(self, message: str) -> None: + assert is_fatal_error(message), f"Should detect as fatal: {message!r}" + + @pytest.mark.parametrize( + "message", + [ + "Rate limit exceeded", + "Service temporarily unavailable", + "Internal server error", + "Connection reset by peer", + "timeout waiting for response", + "overloaded — try again later", + "Product manager failed to produce a valid PRD", + "", + "Some random error", + ], + ) + def test_transient_errors_not_fatal(self, message: str) -> None: + assert not is_fatal_error(message), f"Should NOT detect as fatal: {message!r}" + + def test_empty_and_none(self) -> None: + assert not is_fatal_error("") + assert not is_fatal_error(None) # type: ignore[arg-type] + + +# --------------------------------------------------------------------------- +# FatalHarnessError +# --------------------------------------------------------------------------- + + +class TestFatalHarnessError: + def test_is_runtime_error_subclass(self) -> None: + err = FatalHarnessError("Credit balance is too low") + assert isinstance(err, RuntimeError) + + def test_message_includes_non_retryable_prefix(self) -> None: + err = FatalHarnessError("Credit balance is too low") + assert "non-retryable" in str(err) + assert "Credit balance is too low" in str(err) + + def test_original_message_preserved(self) -> None: + err = FatalHarnessError("some error") + assert err.original_message == "some error" + + +# --------------------------------------------------------------------------- +# check_fatal_harness_error — HarnessResult inspection +# --------------------------------------------------------------------------- + + +@dataclass +class FakeResult: + is_error: bool = False + error_message: Optional[str] = None + + +class TestCheckFatalHarnessError: + def test_no_error_passes(self) -> None: + result = FakeResult(is_error=False, error_message="Credit balance is too low") + check_fatal_harness_error(result) # Should not raise + + def test_transient_error_passes(self) -> None: + result = FakeResult(is_error=True, error_message="Rate limit exceeded") + check_fatal_harness_error(result) # Should not raise + + def test_fatal_error_raises(self) -> None: + result = FakeResult(is_error=True, error_message="Credit balance is too low") + with pytest.raises(FatalHarnessError, match="Credit balance is too low"): + check_fatal_harness_error(result) + + def test_fatal_error_invalid_key_raises(self) -> None: + result = FakeResult(is_error=True, error_message="Invalid API key") + with pytest.raises(FatalHarnessError): + check_fatal_harness_error(result) + + def test_none_error_message_passes(self) -> None: + result = FakeResult(is_error=True, error_message=None) + check_fatal_harness_error(result) # Should not raise + + def test_empty_error_message_passes(self) -> None: + result = FakeResult(is_error=True, error_message="") + check_fatal_harness_error(result) # Should not raise + + +# --------------------------------------------------------------------------- +# envelope.py integration — FatalHarnessError on envelope errors +# --------------------------------------------------------------------------- + + +class TestEnvelopeFatalError: + """Verify unwrap_call_result raises FatalHarnessError for fatal envelope errors.""" + + def test_envelope_fatal_error_raises(self) -> None: + envelope = { + "execution_id": "test-123", + "status": "failed", + "error_message": "Credit balance is too low", + "result": None, + } + with pytest.raises(FatalHarnessError, match="Credit balance is too low"): + from swe_af.execution.envelope import unwrap_call_result + unwrap_call_result(envelope, label="test") + + def test_envelope_transient_error_raises_runtime(self) -> None: + envelope = { + "execution_id": "test-123", + "status": "failed", + "error_message": "some transient failure", + "result": None, + } + with pytest.raises(RuntimeError, match="some transient failure"): + from swe_af.execution.envelope import unwrap_call_result + unwrap_call_result(envelope, label="test") + # Ensure it's NOT a FatalHarnessError + try: + from swe_af.execution.envelope import unwrap_call_result + unwrap_call_result(envelope, label="test") + except FatalHarnessError: + pytest.fail("Transient error should not raise FatalHarnessError") + except RuntimeError: + pass # Expected + + def test_envelope_success_passes(self) -> None: + envelope = { + "execution_id": "test-123", + "status": "success", + "result": {"plan": []}, + } + from swe_af.execution.envelope import unwrap_call_result + result = unwrap_call_result(envelope, label="test") + assert result == {"plan": []}