Skip to content

fix: prevent action data loss from partial line reads in _read_action_log#460

Open
voidborne-d wants to merge 1 commit into666ghj:mainfrom
voidborne-d:fix/action-log-partial-line-data-loss
Open

fix: prevent action data loss from partial line reads in _read_action_log#460
voidborne-d wants to merge 1 commit into666ghj:mainfrom
voidborne-d:fix/action-log-partial-line-data-loss

Conversation

@voidborne-d
Copy link
Copy Markdown

Problem

During active simulations, the monitor thread can permanently lose agent actions from the JSONL action log (twitter/actions.jsonl, reddit/actions.jsonl).

Root cause

_read_action_log() uses for line in f to iterate the log file while the simulation subprocess is actively writing to it. When the writer hasn't flushed a complete line yet (no trailing \n), Python's file iterator returns the partial data as a line. The sequence:

  1. Writer appends: {"agent_id": 1, "action_type": "CRE (partial, no \n yet)
  2. Reader iterates, gets partial line → json.loads fails → silently skipped (except json.JSONDecodeError: pass)
  3. Reader returns f.tell() which is past the partial data
  4. Writer completes: ATE_POST", "round": 1}\n
  5. Next poll starts from the new position (mid-line), reads: ATE_POST", "round": 1}\njson.loads fails again
  6. ❌ The action is permanently lost

This affects any simulation with concurrent write/read — which is every running simulation, since the monitor thread polls every 2 seconds while the subprocess writes actions continuously.

Additional issue

The original code also called f.tell() after a for line in f loop. In CPython, tell() is disabled inside a file iterator and raises OSError: telling position disabled by next() call. The current code only avoids this because f.tell() is called after the loop exits — but it reports the position at EOF, including any partial data.

Fix

  • Replace for line in f with f.readline() loop (which supports interleaved f.tell() calls)
  • Track safe_position: only advance after reading a complete line (ends with \n)
  • Partial lines (no trailing \n) cause the reader to stop and retry on the next poll — no data is skipped
# Before (buggy):
for line in f:
    # ... process line ...
return f.tell()  # may be past partial data!

# After (safe):
safe_position = position
while True:
    line = f.readline()
    if not line:
        break  # EOF
    if not line.endswith("\n"):
        break  # partial line — retry next poll
    safe_position = f.tell()
    # ... process line ...
return safe_position  # never past incomplete data

Tests

Added 16 tests covering:

Category Tests
Partial line regression test_partial_line_not_consumed, test_partial_line_recovered_on_next_poll, test_multiple_partial_writes_no_loss
Basic functionality test_empty_file, test_single_complete_action, test_multiple_actions, test_event_lines_parsed, test_simulation_end_event, test_incremental_reading
Edge cases test_malformed_json_skipped, test_blank_lines_skipped, test_unicode_content, test_nonexistent_file, test_reddit_platform, test_only_partial_line_no_advance
Concurrent simulation test_writer_mid_line_then_completes

All 16 tests pass.

Changed files

  • backend/app/services/simulation_runner.py — fix _read_action_log() to use readline() + safe_position tracking
  • backend/tests/test_read_action_log.py — 16 new tests (with lightweight dependency stubs)
  • backend/tests/__init__.py — new test package

…_log

When the simulation subprocess writes a JSONL action line, the monitor
thread may read it before the write is flushed (no trailing newline).
The old code used `for line in f` + `f.tell()` which advanced past
the partial data. On the next poll the reader started mid-line, causing
both halves to fail json.loads — the action was permanently lost.

Root cause: `for line in f` returns partial lines at EOF (without \n)
and then `f.tell()` reports a position past the incomplete data.

Fix:
- Switch from `for line in f` to `f.readline()` (which supports
  interleaved `f.tell()` calls)
- Track `safe_position`: only advance after reading a complete line
  (one that ends with \n)
- Partial lines (no trailing \n) cause the reader to stop and retry
  on the next poll cycle — no data is skipped

Added 16 tests covering:
- Partial line not consumed (core regression)
- Partial line recovered on next poll
- Multiple partial writes with no loss
- Concurrent writer simulation
- Basic action/event reading
- Edge cases (malformed JSON, blank lines, Unicode, nonexistent file)
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant