NOT READY : Add task 029 by tihilya · Pull Request #67 · iron-cage/sdk

tihilya · 2026-05-13T10:01:06Z

No description provided.

wanguardd

System of Review completed analysis of 'Add task 029'. Found 11 findings across the task file.

📋 Review Summary

Category	🔴 Blocking	🟡🟠 Non-Blocking	Total
📋 REQUIREMENTS	8	1	9
🐛 CORRECTNESS	1	0	1
📖 DOCUMENTATION	0	1	1
Total	9	2	11

📋 REQUIREMENTS

[🔴 Blocking | INTRODUCED] ## Goal section absent — task file uses non-template headings instead of the required MOST Goal block.

The task file opens with ## The request (a narrative brief) and ## What should work (an outcome statement). Neither satisfies the mandatory MOST Goal: ## What should work describes a positive end-state but provides no runnable verification command (Testable), no explicit "why this task now" rationale (Motivated), and does not enumerate the exact artifacts and APIs being created (Observable). An executor claiming this task has no testable definition of done.

Evidence:

# task/029_freeform_onboarding.md — section headers
## The request          ← non-template heading
## What should work     ← non-template heading; no Goal/MOST present
## Admin path
## Member path
...

# tsk.rulebook.md § Task File Content : Task File Template Structure
## Goal
[One-paragraph MOST goal: state the observable end-state (exact artifacts/APIs),
why it matters now (Motivated), the single bounded deliverable (Scoped),
and the runnable command that proves success (Testable).]

Action:

Replace ## The request and ## What should work with a single ## Goal section containing a MOST-compliant paragraph: name the exact APIs, routes, and DB tables being created (Observable); state why this task is prioritized in the current development wave (Motivated); bound to a single deliverable (Scoped); and provide a runnable verification command, e.g. an integration test invocation (Testable).

[🔴 Blocking | INTRODUCED] ## Work Procedure section absent — no ordered execution steps exist for an implementor.

The task spec describes what the finished product looks like across 18 slides, but contains no instructions for building it. An executor has no sequence of steps to follow: no "read rulebooks first," no "write test matrix before test code," no TDD sequence, no grouping of sub-features into buildable increments.

Evidence:

# grep "^## Work Procedure" task/029_freeform_onboarding.md
(no match)

# tsk.rulebook.md § Task File Content : Work Procedure Requirement
## Work Procedure
Execute in order. Do not skip or reorder steps.
1. Read rulebooks — kbase .rulebooks
2. Write Test Matrix — populate every row before opening any test file
3. Write failing tests — implement every test case from the Test Matrix
4. Implement — write minimum code to make tests pass
...

Action:

Add ## Work Procedure with ordered numbered steps per the tsk.rulebook.md template, grouped by sub-feature: (1) auth surface (magic-link endpoints + schema), (2) user registration schema migration, (3) AI inference backend endpoint, (4) FreeForm toggle frontend component, (5) invite link generation and consumption endpoints, (6) model access request/approval flow, (7) dashboard onboarding views. Each step should reference the specific file, route, or migration being created or modified.

[🔴 Blocking | INTRODUCED] ## Validation section absent — no checklist, measurements, invariants, or anti-faking checks defined.

Without a validation section, an independent validator has no criteria to check against after implementation. The sole acceptance criterion ("completes end-to-end in a real browser") can only be tested manually against a deployed environment and cannot be run in CI — it cannot serve as a validation gate.

Evidence:

# grep "^## Validation" task/029_freeform_onboarding.md
(no match)

# task/029_freeform_onboarding.md line 210 (## Acceptance)
"The admin path and member path above complete end-to-end in a real browser
against the deployed control API, with the deck's 60-second promise holding
on a recorded run."
← demo-only criterion; not runnable in CI

# tsk.rulebook.md § Task File Content : Validation Requirement
## Validation
### Checklist      (boolean yes/no questions)
### Measurements   (quantified checks with commands)
### Invariants     (I1: w3 .test level::3 → 0 failures; I2: compiler clean)
### Anti-faking checks (commands that cannot pass without genuine implementation)

Action:

Add ## Validation with all four subsections. ### Checklist must cover: FreeForm toggle renders on each form surface; AI inference endpoint returns structured JSON given free-text input; invite link generation returns a URL matching the ic_team_* prefix; seat decrement is atomic under concurrent requests. ### Invariants must include I1 (w3 .test level::3 → 0 failures) and I2 (compiler clean). ### Anti-faking checks must include at least one curl-invocable endpoint check that confirms the feature exists at the API layer.

[🔴 Blocking | INTRODUCED] ## Execution State block absent — task cannot be claimed, tracked, or transitioned by any actor.

Every task file must open with the Execution State block that drives the task lifecycle: a new task starts at 🎯 (Available), an actor claims it (setting Actor + Claimed At), and the status progresses through the lifecycle. Without this block the task management machinery has no state to read or write.

Evidence:

# grep "^## Execution State" task/029_freeform_onboarding.md
(no match)

# tsk.rulebook.md § Task File Content : Actor Tracking Fields — mandatory template block:
## Execution State
- **Executor Type:** any
- **Actor:** null
- **Claimed At:** null
- **Status:** 🎯 (Available)
- **Closes:** null

Action:

Add the ## Execution State block as the first section after the file title (before ## Goal), with all five fields initialized to their new-task defaults per the template.

[🔴 Blocking | INTRODUCED] Slide 11 and ## Universal behavior contain a direct logical contradiction that leaves transactionality semantics undefined.

Slide 11 (line 99) states: "Apply runs the whole block in one transaction — either everything persists, or nothing does." The Universal behavior section (line 197) states: "Re-applying any paste is a clean no-op (idempotent)." These are contradictory when the paste block contains items that already exist — for example, a provider key already registered, or an email already in the system. Under the transactional rule, a pre-existing duplicate triggers a full rollback. Under the idempotency rule, the duplicate is skipped and new items are applied. Both behaviors are present in the spec with equal authority.

Evidence:

# task/029_freeform_onboarding.md
Line 99:  "Apply runs the whole block in one transaction — either everything
           persists, or nothing does."
Line 197: "Re-applying any paste is a clean no-op (idempotent). Invalid input
           rejects the whole block with per-line errors; nothing partial is persisted."

If the paste contains alice@acmecorp.com (already invited) and 9 new emails:

Transactional semantics: all 10 are rolled back because of the duplicate.
Idempotent semantics: alice is skipped, 9 new invitations are created.

Action:

Pick one semantic and remove the contradiction. Option A (idempotent/upsert): re-applying skips already-persisted items and applies only new ones — specify which condition triggers "already exists" (exact match on email/key, not fuzzy). Option B (strict transactional): any duplicate causes the entire apply to fail with an error listing all conflicting items — add this to the error paths specification. Whichever is chosen, update line 197 (## Universal behavior) to be consistent with slide 11 or vice versa.

[🔴 Blocking | INTRODUCED] Slide 1 replaces the existing password-based auth surface with magic-link login but provides no migration spec, no endpoint definitions, and no path for existing users.

The spec states "This is the only authentication surface — there is no separate sign-up" (line 21), declaring a complete replacement of the login flow. The existing auth system uses bcrypt-hashed passwords stored in users.password_hash (NOT NULL constraint). The login endpoint calls authenticate_user(&state.db_pool, &request.email, &request.password). There is no magic-link token table, no email delivery mechanism, and no one-time-use token validation in the codebase. An implementor building slide 1 must decide unilaterally whether to (a) break existing admin users who have passwords, (b) run both systems in parallel, or (c) convert all password-bearing rows — none of which the spec addresses.

Evidence:

# module/iron_token_manager/migrations/003_create_users_table.sql line 18
password_hash TEXT NOT NULL CHECK (LENGTH(password_hash) > 0 AND LENGTH(password_hash) <= 255)
-- Constraint: password_hash cannot be NULL or empty

# module/iron_control_api/src/routes/auth/handlers.rs line 138
authenticate_user(&state.db_pool, &request.email, &request.password)
-- Existing login calls authenticate_user with email+password

# task/029_freeform_onboarding.md line 21
"The admin enters their email and receives a magic-link sign-in. No password.
This is the only authentication surface — there is no separate sign-up."
-- Declares complete replacement with zero migration guidance

Action:

Add auth migration scope to ## In Scope: either (a) scope this task to magic-link as a parallel path (do not remove password login) and spec two new endpoints (POST /api/v1/auth/magic-link/request, GET /api/v1/auth/magic-link/verify) with a new magic_link_tokens table (id, user_id, token_hash, expires_at, used_at); or (b) scope this task to include the full replacement migration with a procedure for converting existing password_hash entries and a sunset plan for the password login endpoint.

[🔴 Blocking | INTRODUCED] Slides 2, 4, 15, and 17 collect First Name, Last Name, and Birthday fields that have no corresponding columns in the database schema across all 30 migrations.

The spec describes collecting and confirming three registration profile fields via AI inference. The users table has a single name column (TEXT, nullable) for user profile data. No migration in the 30-migration history adds first_name, last_name, or birthday columns. An implementor following slide 4 (prefilling "Jane" / "Smith" / "02/23/1982") has nowhere to persist the confirmed data.

Evidence:

# grep first_name|last_name|birthday across all 30 migrations in
# module/iron_token_manager/migrations/
→ 0 matches

# module/iron_token_manager/migrations/003_create_users_table.sql
users table columns: id, username, password_hash, email, role, name,
                     is_active, created_at
(no first_name, no last_name, no birthday)

# task/029_freeform_onboarding.md line 39
"the same classic form reopens with all three fields prefilled
(Jane / Smith / 02/23/1982)"

Action:

Add to ## In Scope: a new migration (migration 031 or later) adding first_name TEXT CHECK(LENGTH(first_name) <= 100), last_name TEXT CHECK(LENGTH(last_name) <= 100), and birthday TEXT CHECK(birthday GLOB '[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9]') to the users table, consistent with the MM/DD/YYYY format referenced in the spec. Alternatively, explicitly defer profile expansion and scope slides 2–4 to use the existing name field (First + Last concatenated).

[🔴 Blocking | INTRODUCED] No error paths specified for any FreeForm surface — the spec describes only happy-path flows for all 18 actionable slides.

Every FreeForm surface involves user input → backend processing → frontend display. The spec provides no guidance on any failure case. An implementor cannot build a complete feature without knowing what to display when the AI inference call fails, when input is malformed, or when the apply transaction is rejected.

Evidence:

# Error path coverage in task/029_freeform_onboarding.md:
Line 197: "Invalid input rejects the whole block with per-line errors"
→ only mention of errors; format unspecified, trigger condition vague

# Missing error paths:
1. Slide 3/6: AI inference backend returns 5xx → user sees what?
2. Slide 7: AI infers Account Type value not in enum → reject? default? ask user?
3. Slide 10: Provider key fails format validation → per-line error, but what format?
4. Slide 14: Invite link expired → member sees what page?
5. Slide 14: Invite link exhausted (0 seats remaining) → member sees what?
6. Slide 17: Seat decrement race (two members claim last seat simultaneously) → which succeeds?

Action:

Add an ## Error Paths section (or inline under each relevant slide) specifying for each FreeForm surface: (1) AI backend failure — show error inline with "Try again" option, do not persist; (2) enum mismatch — reject the inferred value and highlight the field for manual entry; (3) provider key format validation — per-line error listing the offending line number and expected format per provider; (4) expired invite link — render a dedicated "This link has expired" page with a prompt to contact the workspace admin; (5) exhausted invite link — render "This team is full" page. Invite the spec author to define the exact JSON error contract between backend and frontend for the "per-line errors" in Universal behavior.

🐛 CORRECTNESS

[🔴 Blocking | INTRODUCED] Easiness score E=5 is 2–3 points too high — every sub-feature in this task is entirely greenfield, making it among the hardest tasks in the index.

E=5 places task 029 at the same implementation difficulty as task 003 (provider key spending/limits), which builds on an already-existing spending cap architecture. Task 029 requires implementing 8 independent new features from scratch, none of which has any scaffolding in the codebase. Using E=5 inflates Advisability from the correct ~720 to 1800, placing the task at rank 8 instead of approximately rank 24–26, and misrepresents the implementation investment to anyone using the index to plan work.

Evidence:

# task/readme.md (PR version): task 029 scored V=10, E=5, P=4, S=9, Adv=1800, rank=8

# Calibration against existing tasks:
Task 017 (Python SDK async bridge, existing pyo3/maturin setup):     E=3, Adv=1500
Task 023 (hardening, existing test infrastructure and patterns):      E=3, Adv=810
Task 022 (provider failover, existing routing infrastructure):        E=4, Adv=864
Task 003 (provider key spending, existing spending cap patterns):     E=5, Adv=1620

# Greenfield sub-features in task 029 with zero codebase scaffolding:
- Magic-link authentication:     grep "magic_link\|passwordless" migrations/ → 0 matches
- Team/workspace schema:          grep "workspace\|company\|team_name" migrations/ → 0 matches
- AI inference backend:           iron_llm_core routes agent proxy traffic, not server-side inference
- Invite link system:             no invite_tokens table, no join route in 30 migrations
- Self-serve user registration:   create_user is admin-only endpoint
- Model access request/approval:  unrelated to existing budget_requests workflow
- New dashboard views:            9 views exist, none are onboarding flows

Action:

Correct E to E=2 (hardest range, comparable to tasks requiring entirely new subsystems with no existing patterns to build on). Recalculate: Adv = V×E×P×S = 10×2×4×9 = 720. Update task 029's row in task/readme.md and re-sort the advisability table — corrected rank is approximately 24–26. Add a brief rationale comment to the task file explaining the E=2 calibration (all 8 sub-features are greenfield, no existing patterns or tables to extend).

📖 DOCUMENTATION

[🟠 Note | INTRODUCED] Slides 19–20 are presentation deck marketing frames, not implementable specification.

The spec numbers 20 slides and uses slide numbering as the organizing principle throughout. Slides 19–20 are deck section title and closing summary slides — they contain no behavioral requirements, no UI states, and no implementation targets. Keeping them numbered as slides in a task spec inflates the apparent scope to 20 implementation targets when 18 is correct.

Evidence:

# task/029_freeform_onboarding.md
Lines 180–184: "Slide 19 — From zero to protected (section title)
  The section title slide that introduces the flow in the deck:
  'From zero to protected in under 60 seconds.'"

Lines 186–190: "Slide 20 — Iron Cage Control Panel (closing summary)
  The closing summary: Zero to 10-person protected team. Under 60 seconds. No forms filled."

Both slides carry no actionable requirement. An implementor counting
"20 slides to implement" would allocate time to non-existent deliverables.

Action:

Label slides 19 and 20 explicitly as *(Deck context — not an implementation target)*, or move them to a separate ## Deck Context section outside the numbered slide sequence, so slide count equals implementation target count.

This is a strong product vision with vivid, concrete UX specs — the 20-slide walkthrough communicates the intended experience clearly. The implementation obstacles identified above are foundational: several require architectural decisions about existing systems (auth, tokens, schema) that must be resolved in the spec before any code is written. Looking forward to the revised task file.

❗ > Note: This System of Review operates with the same information access and constraints as any human developer working on this repository. While the system is highly accurate (99%+ reliability), any confusion or apparent misunderstanding typically indicates gaps in repository documentation, unclear code organization, or insufficient project discipline — the same issues that would impact any team member. If the System cannot locate critical information and draws incorrect conclusions, a human developer would face identical challenges. We maintain high standards to respect our teammates' time, our clients' investment, and the integrity of this project. Please ensure the repository provides clear, discoverable context for all reviewers.

Review approval is a quality floor, not a quality ceiling. It confirms that identified issues are resolved — not that the code is free of all issues. The System reviews the changes in this PR, not the entire codebase. Proactive quality is a developer responsibility: unit test coverage, edge case analysis, and consistency with the surrounding codebase cannot be delegated to any review process. If the review surfaces one instance of a pattern, the developer is responsible for auditing all occurrences of that pattern throughout the codebase — not just the specific line cited.

When addressing review feedback: Open a separate commit for each point you address, clearly referencing the finding in the commit message. This ensures knowledge is captured in the repository history and helps other developers avoid the same pitfalls. If you cannot address a point, leave a detailed comment in this PR review thread explaining specifically what is wrong with the finding or why it cannot be addressed — never ignore feedback silently. Use comments only in rare cases when the System has genuinely missed existing context in the repository; prefer commits as the primary response mechanism to build institutional knowledge.

tihilya added 2 commits May 13, 2026 13:00

Add tasks 029-033

34f9dd7

Consolidate into task 029

db4a22f

tihilya changed the title ~~NEED REVIEW : Add tasks 029-033~~ NEED REVIEW : Add task 029 May 13, 2026

wanguardd requested changes May 13, 2026

View reviewed changes

wanguardd changed the title ~~NEED REVIEW : Add task 029~~ NOT READY : Add task 029 May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NOT READY : Add task 029#67

NOT READY : Add task 029#67
tihilya wants to merge 2 commits into
masterfrom
docs/tasks-029-033

tihilya commented May 13, 2026

Uh oh!

wanguardd left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tihilya commented May 13, 2026

Uh oh!

wanguardd left a comment

Choose a reason for hiding this comment

📋 Review Summary

📋 REQUIREMENTS

🐛 CORRECTNESS

📖 DOCUMENTATION

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants