fix: prevent simulation destruction on page refresh + add dashboard view by hassanpasha5630 · Pull Request #30 · nikmcfly/MiroFish-Offline

hassanpasha5630 · 2026-04-03T18:31:12Z

Summary

While self-hosting MiroFish-Offline on a 4x GPU Linux server, we discovered that refreshing the browser kills the running simulation and permanently deletes all data. This PR fixes the root cause (two compounding bugs) plus related issues found during debugging, adds a dashboard view to monitor simulations, and fixes orphaned simulation recovery on backend restart.

The kill chain (before this PR)

Step3Simulation.vue onMounted() calls doStartSimulation() unconditionally on every page load
doStartSimulation() has force: true hardcoded, telling the backend to kill any running process
Backend kills the simulation subprocess, deletes all runtime files (action logs, SQLite DBs, run state)
Backend starts a new simulation from round 0
All previous progress is permanently lost

Bug fixes

Frontend: check before starting — onMounted now calls getRunStatus() first; if the simulation is already running, it resumes status polling instead of restarting (Step3Simulation.vue)
Frontend: don't force by default — changed force: true to force: false in the default start path (Step3Simulation.vue)
Backend: pass GraphStorage to subprocess — the /start endpoint was calling SimulationRunner.start_simulation() without the storage parameter, so GraphMemoryUpdater threw "Must provide storage" and graph updates silently failed. Now fetches neo4j_storage from Flask context and passes it through (simulation.py)
Backend: preserve state.json on "already running" — when /start returns 400 for an already-running simulation, state.json is now saved with status: "running" before returning the error, preventing a desync where the frontend sees "ready" and retries in a loop (simulation.py)
Backend: clear Neo4j on force-restart — cleanup_simulation_logs() now accepts optional storage and graph_id parameters to clear stale graph data during force-restart (simulation_runner.py)
Backend: reconnect to orphaned simulations on restart — when the backend restarts (crash, debug auto-reload, manual restart), the monitor thread that updates run_state.json dies while the simulation subprocess keeps running. On startup, SimulationRunner.reconnect_orphaned_simulations() now scans for simulations with runner_status="running", checks if the PID is still alive, and starts a new monitor thread to resume reading action logs and updating state. Dead processes are marked as stopped. (simulation_runner.py, __init__.py)
Backend: reconnect GraphMemoryUpdater for orphaned simulations — the orphan reconnect was recovering the monitor thread but not the GraphMemoryUpdater, so graph updates stopped after any backend restart. Now reads graph_id from state.json, creates a fresh Neo4jStorage connection, and restarts the updater so simulation actions continue flowing into the knowledge graph. (simulation_runner.py)
Backend: fix duplicate graph episodes on reconnect — the orphan monitor started reading action logs from position 0 on every backend restart, re-feeding all actions to GraphMemoryUpdater and creating duplicate episodes in Neo4j. Now starts from end of existing log files so only new actions are processed. (simulation_runner.py)

Additional fixes (found during debugging)

Bumped default LLM context window from 8192 to 32768 tokens (llm_client.py)
Added traceback logging for ontology generation failures (graph.py)
Filter malformed entity/edge types missing the name key before validation (ontology_generator.py)

New feature: Simulation Dashboard (`/dashboard`)

Built to verify that the bug fixes were working — turned out to be a useful feature:

Active Now section with live-updating cards for running simulations (progress bars, round counts, stop/view actions, 3s polling)
All Simulations history table with search input and status filter tabs (All / Running / Completed / Stopped)
Navigation to Graph, Simulation Run, and Report views per simulation
Matches existing design language (Space Grotesk, JetBrains Mono, custom CSS, no new dependencies)
Link added to Home navbar

Test plan

Start a simulation, refresh the browser — simulation continues running (not killed)
Dashboard shows running simulation with live progress updates
GraphMemoryUpdater writes to Neo4j during simulation (verified: 51 entities, 20 relations, 25 episodes)
Calling /start with force: false on a running simulation returns 400 without killing it
Restart backend while simulation is running — backend reconnects and resumes monitoring automatically
Restart backend — GraphMemoryUpdater reconnects and graph updates resume
Restart backend multiple times — no duplicate graph episodes created
Dashboard filter tabs and search work correctly
Navigation from dashboard cards to simulation/report views works

Browser refresh was killing running simulations due to two compounding bugs: the frontend unconditionally called /start on mount with force=true hardcoded, nuking the running process and deleting all data files every time. Bug fixes: - Frontend: check run-status before starting; only start if not already running - Frontend: change force flag from true to false in default start path - Backend: pass GraphStorage to SimulationRunner.start_simulation() so graph memory updates actually work (was failing silently with "Must provide storage") - Backend: preserve state.json as "running" when /start returns 400 for already-running sim (prevents frontend retry loop from state desync) - Backend: clear Neo4j graph data during force-restart cleanup (was leaving stale nodes/edges from previous runs) Additional fixes applied during debugging: - Bump default LLM context window from 8192 to 32768 tokens - Add traceback logging for ontology generation failures - Filter malformed entity/edge types missing 'name' key New feature — simulation dashboard (/dashboard): - "Active Now" section with live-updating cards for running simulations (progress bars, round counts, stop/view actions, 3s polling) - "All Simulations" history table with search and status filter tabs - Added to help verify the bug fixes were working correctly Made-with: Cursor

When the backend restarts (crash, manual restart, debug auto-reload), the monitor thread that reads action logs and updates run_state.json dies, but the simulation subprocess survives. This left simulations in a "running" state with stale progress data. On startup, SimulationRunner now scans for run_state.json files with runner_status="running", checks if the PID is still alive, and starts a lightweight monitor thread that reads action logs and updates state. Dead processes are marked as stopped. Made-with: Cursor

The orphan reconnect was recovering the monitor thread (for run_state updates) but not the GraphMemoryUpdater, so graph updates stopped after any backend restart. Now reads graph_id from state.json, creates a fresh Neo4jStorage connection, and restarts the updater so simulation actions continue flowing into the knowledge graph. Made-with: Cursor

- vite.config.js: host 0.0.0.0, port 3001, ngrok allowedHosts - api/index.js: empty baseURL for proxy-based deployment - future_features.md: proprietary feature roadmap (event injection, simulation resume, narrative stacking, comparative runs) Made-with: Cursor

onMounted only checked for runner_status='running'. When a completed simulation was visited, it tried to restart it (got 400 rejected) and showed a blank "WAITING FOR AGENT ACTIONS" state. Now handles completed and stopped states by loading results directly (phase 2). Made-with: Cursor

When visiting a completed simulation, the action feed showed "WAITING FOR AGENT ACTIONS" because fetchRunStatusDetail() was never called. Now fetches detail data once on load so the action timeline populates. Made-with: Cursor

_monitor_orphaned_simulation started reading action logs from position 0 on every backend restart, re-feeding all actions to GraphMemoryUpdater and creating duplicate episodes. Now starts from end of existing log files so only new actions are processed. Made-with: Cursor

hassanpasha5630 added 7 commits April 3, 2026 13:30

fix: load action feed when viewing completed simulations

757ca0f

When visiting a completed simulation, the action feed showed "WAITING FOR AGENT ACTIONS" because fetchRunStatusDetail() was never called. Now fetches detail data once on load so the action timeline populates. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent simulation destruction on page refresh + add dashboard view#30

fix: prevent simulation destruction on page refresh + add dashboard view#30
hassanpasha5630 wants to merge 7 commits intonikmcfly:mainfrom
hassanpasha5630:fix/simulation-lifecycle-and-dashboard

hassanpasha5630 commented Apr 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hassanpasha5630 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The kill chain (before this PR)

Bug fixes

Additional fixes (found during debugging)

New feature: Simulation Dashboard (/dashboard)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hassanpasha5630 commented Apr 3, 2026 •

edited

Loading

New feature: Simulation Dashboard (`/dashboard`)