Follow-up to #96 (fixed for v1.4.1): the "re-queued forever" behavior still reproduces on v1.4.2, via a different failure path that doesn't write the error sentinel — so failed items are retried (and re-billed) on every sync.
What's different from #96
#96 fixed the extractSummary() undefined crash. But callClaude() now throws SummarizerSdkError whenever the result has is_error: true — e.g. subtype: "success" when resume-ing a session that isn't resumable. isResumeFailure() only treats subtype === "error_during_execution" as recoverable, so any other is_error re-throws out of summarizeConversation() and the item is left without a sentinel → re-queued next sync. The API request is already made (and billed) before the error surfaces, and there is no backoff / retry cap.
Evidence (sanitized)
Same items fail and are re-attempted across consecutive syncs:
<archive>/<file>.jsonl Summary generation failed: Summarizer SDK error: success (session <id>)
<archive>/<file>.jsonl Summary generation failed: Summarizer SDK error: success (session <id>)
… (one per item, every sync)
Representative sync result — failures, items stay unprocessed and get retried:
✅ Sync complete!
Copied: 7
Skipped: 1546
Indexed: 5
Summarized: 0
⚠️ Errors: 9
Summary generation failed: Summarizer SDK error: success appears ~25× in a single rolling log (~2 weeks). Each retry issues a paid API request, so a backlog of non-summarizable items (e.g. external/observer session files passed to resume) drives unbounded spend with no cap or alert. Compounds with reentrant syncs (cf. #88).
Suggested fix
- Write the error sentinel for all failure paths (not just
extractSummary/empty branch), so failed items aren't re-queued every sync.
- Treat any
is_error result (not only error_during_execution) as handled; for non-resumable sessions, fall back to non-resume instead of re-throwing.
- Add backoff + max-retries, and optionally a per-sync call/spend cap as a safety net.
Follow-up to #96 (fixed for v1.4.1): the "re-queued forever" behavior still reproduces on v1.4.2, via a different failure path that doesn't write the error sentinel — so failed items are retried (and re-billed) on every sync.
What's different from #96
#96 fixed the
extractSummary()undefined crash. ButcallClaude()now throwsSummarizerSdkErrorwhenever the result hasis_error: true— e.g.subtype: "success"whenresume-ing a session that isn't resumable.isResumeFailure()only treatssubtype === "error_during_execution"as recoverable, so any otheris_errorre-throws out ofsummarizeConversation()and the item is left without a sentinel → re-queued next sync. The API request is already made (and billed) before the error surfaces, and there is no backoff / retry cap.Evidence (sanitized)
Same items fail and are re-attempted across consecutive syncs:
Representative sync result — failures, items stay unprocessed and get retried:
Summary generation failed: Summarizer SDK error: successappears ~25× in a single rolling log (~2 weeks). Each retry issues a paid API request, so a backlog of non-summarizable items (e.g. external/observer session files passed toresume) drives unbounded spend with no cap or alert. Compounds with reentrant syncs (cf. #88).Suggested fix
extractSummary/empty branch), so failed items aren't re-queued every sync.is_errorresult (not onlyerror_during_execution) as handled; for non-resumable sessions, fall back to non-resume instead of re-throwing.