Skip to content

Session Save() uses UPDATE-only — cron job sessions never persist, retry reads stale cache #379

@duhd-vnpay

Description

@duhd-vnpay

Bug

PGSessionStore.Save() uses UPDATE sessions ... WHERE session_key = $19. If the session row does not exist in the DB yet (first run of a cron job), the UPDATE affects 0 rows and returns silently. However, AddMessage() has already added messages to the in-memory cache.

This causes two problems:

  1. Cron sessions never persist — first-run cron sessions exist only in memory, lost on restart
  2. Stale cache on retry — when ExecuteWithRetry retries a failed cron job, the next attempt reads the stale in-memory cache from the failed attempt, polluting the conversation history with leftover messages (e.g., empty user message + "..." assistant fallback)

Root Cause

// sessions_list.go
func (s *PGSessionStore) Save(key string) error {
    // ...
    _, err := s.db.Exec(
        `UPDATE sessions SET messages = $1, ... WHERE session_key = $19`,
        // ...
    )
    return err  // UPDATE 0 rows → no error, but session not persisted
}

The session row is created via a separate createSession() path (e.g., GetOrCreateSession), but the cron agent loop calls AddMessage() + Save() without ensuring the row exists first.

Impact

  • Cron job retries (via ExecuteWithRetry) accumulate stale messages in cache across attempts
  • The stale messages cause LLM 400 errors when the proxy converts empty content blocks to Anthropic format (messages.0.content.0.text.text: Field required)
  • Jobs that fail once tend to keep failing on all retry attempts due to cache pollution

Suggested Fix

Option A: Save() should use INSERT ... ON CONFLICT (session_key) DO UPDATE (UPSERT)

Option B: Clear in-memory cache for the session key when the cron handler starts a new attempt (reset between retries)

Option C: Have the cron handler call GetOrCreateSession() before running the agent loop to ensure the DB row exists

Steps to Reproduce

  1. Create a cron job with a valid schedule
  2. Ensure the cron agent makes tool calls but fails at iteration 2 (e.g., LLM error)
  3. ExecuteWithRetry retries the job
  4. The retry reads stale history from cache (empty user + "..." assistant from the failed attempt)
  5. This stale history causes the LLM request to fail again
  6. All 4 retry attempts fail with the same error

Environment

  • GoClaw v1.74.1
  • PostgreSQL-backed session store (PGSessionStore)
  • Cron retry config: 3 retries with backoff

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions