-
Notifications
You must be signed in to change notification settings - Fork 325
Description
Bug
PGSessionStore.Save() uses UPDATE sessions ... WHERE session_key = $19. If the session row does not exist in the DB yet (first run of a cron job), the UPDATE affects 0 rows and returns silently. However, AddMessage() has already added messages to the in-memory cache.
This causes two problems:
- Cron sessions never persist — first-run cron sessions exist only in memory, lost on restart
- Stale cache on retry — when
ExecuteWithRetryretries a failed cron job, the next attempt reads the stale in-memory cache from the failed attempt, polluting the conversation history with leftover messages (e.g., empty user message +"..."assistant fallback)
Root Cause
// sessions_list.go
func (s *PGSessionStore) Save(key string) error {
// ...
_, err := s.db.Exec(
`UPDATE sessions SET messages = $1, ... WHERE session_key = $19`,
// ...
)
return err // UPDATE 0 rows → no error, but session not persisted
}The session row is created via a separate createSession() path (e.g., GetOrCreateSession), but the cron agent loop calls AddMessage() + Save() without ensuring the row exists first.
Impact
- Cron job retries (via
ExecuteWithRetry) accumulate stale messages in cache across attempts - The stale messages cause LLM 400 errors when the proxy converts empty content blocks to Anthropic format (
messages.0.content.0.text.text: Field required) - Jobs that fail once tend to keep failing on all retry attempts due to cache pollution
Suggested Fix
Option A: Save() should use INSERT ... ON CONFLICT (session_key) DO UPDATE (UPSERT)
Option B: Clear in-memory cache for the session key when the cron handler starts a new attempt (reset between retries)
Option C: Have the cron handler call GetOrCreateSession() before running the agent loop to ensure the DB row exists
Steps to Reproduce
- Create a cron job with a valid schedule
- Ensure the cron agent makes tool calls but fails at iteration 2 (e.g., LLM error)
ExecuteWithRetryretries the job- The retry reads stale history from cache (empty user + "..." assistant from the failed attempt)
- This stale history causes the LLM request to fail again
- All 4 retry attempts fail with the same error
Environment
- GoClaw v1.74.1
- PostgreSQL-backed session store (
PGSessionStore) - Cron retry config: 3 retries with backoff