Agent-Field · AbirAbbas · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026
@@ -45,6 +45,10 @@ RUN uv pip install --system -r /app/requirements.txt
 # Copy application code
 COPY . /app/
 
+# Pre-create /workspaces so named-volume mounts inherit correct permissions
+# (without this, Docker creates it as root read-only on fresh deployments)
+RUN mkdir -p /workspaces && chmod 777 /workspaces
+
 EXPOSE 8003
 
 ENV PORT=8003 \

@@ -0,0 +1,179 @@
+# Deployment Guide
+
+This guide covers deploying SWE-AF on a new server, including prerequisites, known issues, and quick-start instructions.
+
+## Prerequisites
+
+### Software
+
+| Requirement | Minimum Version | Notes |
+|---|---|---|
+| Docker | 20.10+ | With BuildKit support |
+| Docker Compose | 2.0+ | V2 plugin (`docker compose`, not `docker-compose`) |
+| Git | 2.30+ | For cloning the repository |
+
+### Environment Variables
+
+Copy `.env.example` to `.env` and configure at least one authentication method:
+
+```bash
+cp .env.example .env
+```
+
+**Required (one of):**
+
+| Variable | Purpose |
+|---|---|
+| `ANTHROPIC_API_KEY` | Anthropic API key for Claude models |
+| `CLAUDE_CODE_OAUTH_TOKEN` | Claude Code subscription token (uses Pro/Max credits) |
+
+**For open-source models (alternative to Claude):**
+
+| Variable | Purpose |
+|---|---|
+| `OPENROUTER_API_KEY` | OpenRouter API key (200+ models) |
+| `OPENAI_API_KEY` | OpenAI API key |
+| `GOOGLE_API_KEY` | Google Gemini API key |
+
+**Optional:**
+
+| Variable | Purpose | Default |
+|---|---|---|
+| `GH_TOKEN` | GitHub PAT with `repo` scope for draft PRs | *(none)* |
+| `AGENTFIELD_SERVER` | Control plane URL | `http://control-plane:8080` (Docker) |
+| `NODE_ID` | Agent node identifier | `swe-planner` |
+| `PORT` | Agent listen port | `8003` |
+
+### Package Versions
+
+| Package | Minimum Version | Notes |
+|---|---|---|
+| `agentfield` | 0.1.67+ | Python SDK (includes opencode v1.4+ fix) |
+| `claude-agent-sdk` | 0.1.20+ | Claude runtime |
+| opencode CLI | 1.4+ | Only if using `open_code` runtime (see Known Issues) |
+
+## Quick Start
+
+### Full Stack (control plane + agent)
+
+```bash
+git clone https://github.com/Agent-Field/SWE-AF
+cd SWE-AF
+cp .env.example .env   # fill in API keys
+docker compose up -d
+```
+
+This starts:
+- **control-plane** on `:8080` — AgentField orchestration server
+- **swe-agent** on `:8003` — SWE-AF full pipeline (`swe-planner` node)
+- **swe-fast** on `:8004` — SWE-AF fast mode (`swe-fast` node)
+
+### Agent Only (connect to existing control plane)
+
+If you already have an AgentField control plane running:
+
+```bash
+git clone https://github.com/Agent-Field/SWE-AF
+cd SWE-AF
+cp .env.example .env   # fill in API keys
+
+# Set AGENTFIELD_SERVER in .env to your control plane URL
+docker compose -f docker-compose.local.yml up -d
+```
+
+### Verify Deployment
+
+```bash
+# Check agent health
+curl http://localhost:8003/health
+
+# Check control plane (full stack only)
+curl http://localhost:8080/api/v1/health
+```
+
+## Known Issues and Fixes
+
+### `/workspaces` read-only filesystem error
+
+**Symptom:**
+```
+[Errno 30] Read-only file system: '/workspaces'
+```
+
+**Root cause:** The `/workspaces` directory was not pre-created in the Docker image. When Docker mounts a named volume, it creates the directory as root with restrictive permissions.
+
+**Fix:** This is fixed in the current Dockerfile. If you're using an older image, rebuild:
+```bash
+docker compose build --no-cache
+```
+
+The fix adds `RUN mkdir -p /workspaces && chmod 777 /workspaces` to the Dockerfile before the volume mount point.
+
+**Ref:** [#46](https://github.com/Agent-Field/SWE-AF/issues/46)
+
+### `Product manager failed to produce a valid PRD` with `open_code` runtime
+
+**Symptom:** Builds using the `open_code` runtime fail at the Product Manager step with a generic error. The agent completes in a few seconds (too fast for real work).
+
+**Root cause:** opencode CLI v1.4+ changed its CLI interface:
+- `-p` (prompt) flag was removed — prompt is now a positional arg to the `run` subcommand
+- `-c` now means `--continue` (resume session), not project directory
+
+**Fix:** Upgrade the `agentfield` Python SDK to a version that includes the opencode v1.4+ compatibility fix:
+```bash
+pip install --upgrade agentfield
+```
+
+**Ref:** [#45](https://github.com/Agent-Field/SWE-AF/issues/45)
+
+### Fatal API errors silently retry
+
+**Symptom:** Build with exhausted credits or invalid API key retries multiple times before failing with a misleading error (e.g., "Product manager failed to produce a valid PRD").
+
+**Root cause:** Non-retryable API errors (credit exhaustion, invalid key) were not distinguished from transient errors, causing all retry layers to fire.
+
+**Fix:** This is fixed in the current version. Upgrade to get `FatalHarnessError` detection that immediately aborts on:
+- Credit balance too low
+- Invalid API key
+- Authentication failed
+- Account disabled
+- Quota exceeded
+
+**Ref:** [#49](https://github.com/Agent-Field/SWE-AF/issues/49)
+
+### Parallel builds cross-contamination
+
+**Symptom:** Running two builds simultaneously for the same repository causes agents to receive input from the wrong build.
+
+**Root cause:** Both builds cloned to the same workspace path (`/workspaces/<repo-name>`), sharing git state and artifacts.
+
+**Fix:** This is fixed in the current version. Each build now gets an isolated workspace: `/workspaces/<repo-name>-<build_id>`.
+
+**Ref:** [#43](https://github.com/Agent-Field/SWE-AF/issues/43)
+
+## Scaling
+
+### Multiple concurrent builds
+
+Each build automatically gets an isolated workspace. To run multiple builds concurrently:
+
+```bash
+# Scale the agent service
+docker compose up --scale swe-agent=3 -d
+```
+
+### Resource considerations
+
+Each build clones the target repository and runs multiple LLM calls. Plan for:
+- **Disk:** ~500MB per concurrent build (repo clone + artifacts)
+- **Memory:** ~512MB per agent container
+- **Network:** LLM API calls are the bottleneck, not compute
+
+## Troubleshooting
+
+| Symptom | Check |
+|---|---|
+| Agent not registering with control plane | Verify `AGENTFIELD_SERVER` is reachable from the container |
+| Builds timing out | Check API key validity and credit balance |
+| `git clone` failures | Verify `GH_TOKEN` has `repo` scope for private repositories |
+| Health check failing | Check container logs: `docker compose logs swe-agent` |
@@ -2,6 +2,6 @@
 #
 # Same runtime dependencies as requirements.txt.
 
-agentfield>=0.1.41
+agentfield>=0.1.67
 pydantic>=2.0
 claude-agent-sdk==0.1.20
@@ -2,6 +2,6 @@
 #
 # Install: python -m pip install -r requirements.txt
 
-agentfield>=0.1.9
+agentfield>=0.1.67
 pydantic>=2.0
 claude-agent-sdk==0.1.20
@@ -195,18 +195,30 @@ async def build(
     if repo_url:
         cfg.repo_url = repo_url
 
-    # Auto-derive repo_path from repo_url when not specified
+    # Generate build_id BEFORE workspace setup so each concurrent build
+    # gets a fully isolated workspace (repo clone, artifacts, worktrees).
+    # Fixes cross-contamination when parallel builds target the same repo.
+    # Ref: https://github.com/Agent-Field/SWE-AF/issues/43
+    build_id = uuid.uuid4().hex[:8]
+
+    # Auto-derive repo_path from repo_url when not specified.
+    # Each build gets its own clone directory scoped by build_id to prevent
+    # concurrent builds from sharing git state, artifacts, or worktrees.
     if cfg.repo_url and not repo_path:
-        repo_path = f"/workspaces/{_repo_name_from_url(cfg.repo_url)}"
+        repo_name = _repo_name_from_url(cfg.repo_url)
+        repo_path = f"/workspaces/{repo_name}-{build_id}"
 
     # Multi-repo: derive repo_path from primary repo; _clone_repos handles cloning later
     if not repo_path and len(cfg.repos) > 1:
         primary = next((r for r in cfg.repos if r.role == "primary"), cfg.repos[0])
-        repo_path = f"/workspaces/{_repo_name_from_url(primary.repo_url)}"
+        repo_name = _repo_name_from_url(primary.repo_url)
+        repo_path = f"/workspaces/{repo_name}-{build_id}"
 
     if not repo_path:
         raise ValueError("Either repo_path or repo_url must be provided")
 
+    app.note(f"Build starting (build_id={build_id})", tags=["build", "start"])
+
     # Clone if repo_url is set and target doesn't exist yet
     git_dir = os.path.join(repo_path, ".git")
     if cfg.repo_url and not os.path.exists(git_dir):
@@ -222,8 +234,8 @@ async def build(
             app.note(f"Clone failed (exit {clone_result.returncode}): {err}", tags=["build", "clone", "error"])
             raise RuntimeError(f"git clone failed (exit {clone_result.returncode}): {err}")
     elif cfg.repo_url and os.path.exists(git_dir):
-        # Repo already cloned by a prior build — reset to remote default branch
-        # so git_init creates the integration branch from a clean baseline.
+        # Repo already exists at this build-scoped path (unlikely but handle gracefully).
+        # Reset to remote default branch for a clean baseline.
         default_branch = cfg.github_pr_base or "main"
         app.note(
             f"Repo already exists at {repo_path} — resetting to origin/{default_branch}",
@@ -290,12 +302,6 @@ async def build(
     # Resolve runtime + flat model config once for this build.
     resolved = cfg.resolved_models()
 
-    # Unique ID for this build — namespaces git branches/worktrees to prevent
-    # collisions when multiple builds run concurrently on the same repository.
-    build_id = uuid.uuid4().hex[:8]
-
-    app.note(f"Build starting (build_id={build_id})", tags=["build", "start"])
-
     # Compute absolute artifacts directory path for logging
     abs_artifacts_dir = os.path.join(os.path.abspath(repo_path), artifacts_dir)
 

@@ -22,6 +22,7 @@
 from typing import Callable
 
 
+from swe_af.execution.fatal_error import FatalHarnessError
 from swe_af.execution.schemas import (
     DAGState,
     ExecutionConfig,
@@ -325,6 +326,8 @@ async def _run_default_path(
             timeout=timeout,
             label=f"review:{issue_name}:default",
         )
+    except FatalHarnessError:
+        raise
     except Exception as e:
         if note_fn:
             note_fn(
@@ -437,6 +440,8 @@ async def _run_flagged_path(
                     tags=["coding_loop", "review_error", issue_name],
                 )
             review_result = {"approved": True, "blocking": False, "summary": f"Review unavailable: {review_result}"}
+    except FatalHarnessError:
+        raise
     except Exception as e:
         if note_fn:
             note_fn(
@@ -479,6 +484,8 @@ async def _run_flagged_path(
             timeout=timeout,
             label=f"synthesizer:{issue_name}:iter{iteration}",
         )
+    except FatalHarnessError:
+        raise
     except Exception as e:
         if note_fn:
             note_fn(
@@ -625,6 +632,8 @@ async def run_coding_loop(
                 timeout=timeout,
                 label=f"coder:{issue_name}:iter{iteration}",
             )
+        except FatalHarnessError:
+            raise
         except Exception as e:
             if note_fn:
                 note_fn(

@@ -11,6 +11,7 @@
 
 from swe_af.execution.dag_utils import apply_replan, find_downstream
 from swe_af.execution.envelope import unwrap_call_result
+from swe_af.execution.fatal_error import FatalHarnessError
 from swe_af.execution.schemas import (
     AdvisorAction,
     DAGState,
@@ -576,6 +577,8 @@ async def _cleanup_single_repo(
                     f"cleaned={result.get('cleaned', [])}",
                     tags=["execution", "worktree_cleanup", "warning"],
                 )
+        except FatalHarnessError:
+            raise
         except Exception as e:
             if note_fn:
                 note_fn(
@@ -868,6 +871,8 @@ async def _execute_single_issue(
                 timeout=config.agent_timeout_seconds,
                 label=f"issue_advisor:{issue_name}:{advisor_round + 1}",
             )
+        except FatalHarnessError:
+            raise
         except Exception as e:
             if note_fn:
                 note_fn(
@@ -1064,6 +1069,8 @@ async def _run_execute_fn(
                 attempts=attempt,
             )
 
+        except FatalHarnessError:
+            raise
         except Exception as e:
             last_error = str(e)
             last_context = traceback.format_exc()
@@ -1095,6 +1102,8 @@ async def _run_execute_fn(
                         "retry_diagnosis": advice.get("diagnosis", ""),
                     }
                     continue
+                except FatalHarnessError:
+                    raise
                 except Exception:
                     continue
             elif attempt <= config.max_retries_per_issue:

@@ -12,6 +12,8 @@
 
 from __future__ import annotations
 
+from swe_af.execution.fatal_error import FatalHarnessError, is_fatal_error
+
 # Keys present in the execution envelope returned by _build_execute_response.
 _ENVELOPE_KEYS = frozenset({
     "execution_id", "run_id", "node_id", "type", "target",
@@ -51,6 +53,8 @@ def unwrap_call_result(result, label: str = "call"):
     status = str(result.get("status", "")).lower()
     if status in ("failed", "error", "cancelled", "timeout"):
         err = result.get("error_message") or result.get("error") or "unknown"
+        if is_fatal_error(str(err)):
+            raise FatalHarnessError(str(err))
         raise RuntimeError(f"{label} failed (status={status}): {err}")
 
     inner = result.get("result")