Skip to content

Multi-agent collaboration causing content duplication and API failures #21

@a-makelky

Description

@a-makelky

Notion Doc Version of write up here

Proof Editor Technical Issues Report

Date: March 12, 2026

Reporter: Aaron Makelky (OpenClaw user)

Contact: aaron@aaronmakelky.com or https://x.com/theaaron

Severity: Critical - Platform unusable for multi-agent collaboration


Executive Summary

Over the past 48 hours, our team encountered severe data corruption and service instability while using Proof for collaborative AI agent workflows. Three separate documents became unusable due to content duplication, API failures, and concurrent write conflicts. This undermines Proof's core value proposition as a collaborative editor.

Key Issues:

  1. Content duplication on concurrent agent edits
  2. API returning null values (revision, updatedAt, markdown)
  3. Service-wide 502 errors (Application failed to respond)
  4. No automatic detection/prevention of duplicate content
  5. Broken documents cannot be recovered via API

Impact: Lost work, blocked workflows, forced migration to alternative tools


Timeline of Issues

Incident 1: Document c7vnchuu (March 11-12, 2026)

Document: https://www.proofeditor.ai/d/c7vnchuu?token=fed4aa35-d983-4bd2-98ef-99bca0f27cd3

What happened:

  • Created document for API integration planning
  • Squire Bot (AI agent) attempted to add introduction via API
  • Content was duplicated 8+ times throughout the document
  • Each API write seemed to append rather than replace, despite using correct endpoints

API evidence:

# State API returned null values
GET /api/agent/c7vnchuu/state
Response: {"revision": null, "updatedAt": null, "markdown": null}

# Snapshot API returned 502 errors
GET /api/agent/c7vnchuu/snapshot
Response: {"status":"error","code":502,"message":"Application failed to respond","request_id":"faDGsWp5RrGs2DHz-_9nXA"}

Result: Document completely broken, unrecoverable via API. Had to create new document.


Incident 2: Document ppjd1v32 (March 12, 2026, ~08:22 MDT)

Document: https://www.proofeditor.ai/d/ppjd1v32?token=49d34836-4e23-4b0e-989c-633facd60a68

What happened:

  • Created fresh document via /share/markdown API
  • Document creation succeeded with valid slug and tokens
  • Attempted to read state and join document
  • API returned null values and 502 errors
  • User reported document appeared blank in browser UI

API evidence:

# Creation succeeded
POST /share/markdown
Response: {"success":true,"slug":"ppjd1v32","accessToken":"49d34836-..."}

# But state API failed
GET /api/agent/ppjd1v32/state
Response: {"status":"error","code":502,"message":"Application failed to respond"}

Bug report filed: Request ID dkoJqCHeTTi9QcHm-_9nXA

Result: Platform-wide outage, all API endpoints returning 502 errors.


Incident 3: Document gb98e9g4 (March 12, 2026, ~19:25 MDT)

Document: https://www.proofeditor.ai/d/gb98e9g4?token=f54fd0af-93fa-42c0-b449-7f2c622552f7

What happened:

  • Created document for Codex agent project planning
  • Two AI agents (Squire Bot and Codex) joined document simultaneously
  • Codex agent attempted to write project plan
  • Content was duplicated multiple times
  • Headers repeated 4+ times with fragments scattered throughout

Current state (via API):

# Vicki-Recipe coding project doc

This space is for 

# Vicki-Recipe coding project doc

# Vicki-Recipe coding project doc

# Vicki-Recipe coding project doc

This space is for Codex agent, Aaron, and Openclaw to collaborate...
[Full Codex plan appears once correctly]
...
# Vicki-Recipe coding pro

# Vick

# Vicki-Recipe coding projec

# Vicki-Recipe coding project doc

This space is for @

# Vicki-Reci

Visual evidence: Screenshot shows overlapping text fragments and repeated headers (available upon request).

Result: Document requires manual cleanup, trust in concurrent editing broken.


Root Cause Analysis

Problem 1: No Concurrency Control for Agent Writes

Issue: When multiple AI agents write to a document simultaneously, Proof's API accepts all writes but applies them incorrectly, leading to content duplication.

Evidence:

  • All three documents show exact same pattern: repeated headers, partial edits
  • Happens specifically when agents use edit APIs concurrently
  • Does not happen with single-user edits

Hypothesis:

  • API lacks proper optimistic concurrency control
  • baseRevision and baseUpdatedAt parameters are not enforced
  • Writes are applied asynchronously without locking
  • No deduplication or conflict resolution

Problem 2: API Instability During Load

Issue: Service returns 502 errors and null values during periods of API activity.

Evidence:

  • Multiple 502 errors across different endpoints (state, snapshot, ops)
  • Null revision/updatedAt values suggest backend state corruption
  • Happened across 3 different documents over 2 days

Hypothesis:

  • Backend services (Y.js projection, database layer) overwhelmed
  • No proper fallback when real-time sync fails
  • State corruption propagates to API layer

Problem 3: No Automatic Corruption Detection

Issue: Proof allows documents to become severely corrupted without any warning or automatic recovery.

Evidence:

  • Documents with 8+ duplicate sections accepted without error
  • No API validation for duplicate content
  • No automatic cleanup or repair mechanism

Technical Evidence Summary

Example API Failures

Endpoint Document Status Error
GET /api/agent/c7vnchuu/state c7vnchuu 200 {"revision": null, "updatedAt": null}
GET /api/agent/c7vnchuu/snapshot c7vnchuu 502 Application failed to respond
GET /api/agent/ppjd1v32/state ppjd1v32 502 Application failed to respond
POST /share/markdown ppjd1v32 200 Success, but document blank
POST /api/agent/gb98e9g4/ops gb98e9g4 200 Success, but content duplicated

Request IDs for Investigation

  • faDGsWp5RrGs2DHz-_9nXA (c7vnchuu snapshot 502)
  • V6I4hY4fRFSlv5NFLPU1MQ (ppjd1v32 state 502)
  • dkoJqCHeTTi9QcHm-_9nXA (ppjd1v32 creation, subsequent 502)
  • Ji05m8KBS7KDFlIk9I3ezw (bug report 502)

Suggested Fixes

Critical (P0)

  1. Enforce Concurrency Control
    • Strictly validate baseRevision or baseUpdatedAt on all write operations
    • Reject writes with HTTP 409 CONFLICT if base is stale
    • Do not apply writes asynchronously without validation
    • Reference: Your own API contract specifies this, but it's not enforced
  2. Add Content Deduplication
    • Detect identical consecutive blocks (e.g., same header repeated 4+ times)
    • Auto-reject or auto-collapse duplicates
    • Add API warning header when duplication detected
  3. Fix State Projection Stability
    • Investigate why revision and updatedAt return null
    • Add fallback to last known good state
    • Implement state recovery from Y.js document
  4. Improve Error Handling
    • Return structured error responses instead of 502
    • Include actionable error codes and retry guidance
    • Add circuit breaker for cascading failures

High Priority (P1)

  1. Add Agent Write Coordination
    • Implement presence-based write locking (optional)
    • Queue concurrent writes and apply sequentially
    • Add X-Request-Id to all responses for debugging
  2. Document Recovery Tools
    • Add /api/agent/<slug>/recover endpoint
    • Allow rollback to specific revision
    • Provide diff view for corrupted documents
  3. Monitoring and Alerting
    • Add anomaly detection for duplicate content
    • Alert on elevated 502 rates
    • Dashboard for API health by document

Medium Priority (P2)

  1. Better Documentation
    • Document concurrency semantics clearly
    • Provide best practices for multi-agent workflows
    • Add examples with proper error handling
  2. Client-Side Validation
    • JavaScript SDK should check for duplicates before sending
    • Add retry logic with exponential backoff
    • Implement local conflict resolution

Workarounds for Users (Until Fixed)

  1. Serialize Agent Writes
    • Only one agent should write at a time
    • Use events/pending API to wait for previous writes to complete
    • Always read current state before writing
  2. Use edit/v2 with Block Refs
    • Prefer precise block operations over full rewrites
    • Include Idempotency-Key header
    • Use baseRevision from latest snapshot
  3. Monitor for Corruption
    • Periodically read document state via API
    • Check for duplicate headers or fragments
    • Create new document if corruption detected
  4. Have Backup Plan
    • Don't rely on Proof as sole source of truth
    • Keep critical content in local files or other tools
    • Consider self-hosting proof-sdk for reliability

Example Documents for Investigation

Broken Document 1:

https://www.proofeditor.ai/d/c7vnchuu?token=fed4aa35-d983-4bd2-98ef-99bca0f27cd3

  • Status: Completely corrupted (null API values, 502 errors)
  • Issue: Content duplicated 8+ times, API broken

Broken Document 2:

https://www.proofeditor.ai/d/ppjd1v32?token=49d34836-4e23-4b0e-989c-633facd60a68

  • Status: Created successfully but API returns 502
  • Issue: Service instability, blank in UI

Broken Document 3 (Active):

https://www.proofeditor.ai/d/gb98e9g4?token=f54fd0af-93fa-42c0-b449-7f2c622552f7

  • Status: Partially corrupted, still accessible
  • Issue: Headers duplicated 4+ times, fragments scattered

Our Use Case

We are using Proof for multi-agent collaborative workflows where:

  • Human (Aaron) creates documents
  • AI agents (Squire Bot, Codex) read and write via HTTP API
  • Real-time presence and comments are valuable
  • Data integrity is critical

This is exactly the use case Proof advertises ("collaborative document editor with presence, comments, suggestions, and edit APIs"), but the current implementation cannot support it reliably.


Next Steps

  1. Immediate: Please investigate the three example documents and request IDs provided
  2. Short-term: Implement P0 fixes (concurrency control, deduplication, state stability)
  3. Medium-term: Add recovery tools and better monitoring
  4. Ongoing: Keep us informed of progress and estimated fix timelines

We want Proof to succeed - the concept is excellent and the API design is solid. But the current reliability issues make it unusable for production workflows. Happy to provide additional debugging data or test fixes.


Related Resources


Report prepared by: Squire Bot (OpenClaw AI assistant) on behalf of Aaron Makelky

Date: March 12, 2026, 19:35 MDT

Version: 1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions