fix: prevent simulation destruction on page refresh + add dashboard view#30
Open
hassanpasha5630 wants to merge 7 commits intonikmcfly:mainfrom
Open
Conversation
Browser refresh was killing running simulations due to two compounding bugs: the frontend unconditionally called /start on mount with force=true hardcoded, nuking the running process and deleting all data files every time. Bug fixes: - Frontend: check run-status before starting; only start if not already running - Frontend: change force flag from true to false in default start path - Backend: pass GraphStorage to SimulationRunner.start_simulation() so graph memory updates actually work (was failing silently with "Must provide storage") - Backend: preserve state.json as "running" when /start returns 400 for already-running sim (prevents frontend retry loop from state desync) - Backend: clear Neo4j graph data during force-restart cleanup (was leaving stale nodes/edges from previous runs) Additional fixes applied during debugging: - Bump default LLM context window from 8192 to 32768 tokens - Add traceback logging for ontology generation failures - Filter malformed entity/edge types missing 'name' key New feature — simulation dashboard (/dashboard): - "Active Now" section with live-updating cards for running simulations (progress bars, round counts, stop/view actions, 3s polling) - "All Simulations" history table with search and status filter tabs - Added to help verify the bug fixes were working correctly Made-with: Cursor
When the backend restarts (crash, manual restart, debug auto-reload), the monitor thread that reads action logs and updates run_state.json dies, but the simulation subprocess survives. This left simulations in a "running" state with stale progress data. On startup, SimulationRunner now scans for run_state.json files with runner_status="running", checks if the PID is still alive, and starts a lightweight monitor thread that reads action logs and updates state. Dead processes are marked as stopped. Made-with: Cursor
The orphan reconnect was recovering the monitor thread (for run_state updates) but not the GraphMemoryUpdater, so graph updates stopped after any backend restart. Now reads graph_id from state.json, creates a fresh Neo4jStorage connection, and restarts the updater so simulation actions continue flowing into the knowledge graph. Made-with: Cursor
- vite.config.js: host 0.0.0.0, port 3001, ngrok allowedHosts - api/index.js: empty baseURL for proxy-based deployment - future_features.md: proprietary feature roadmap (event injection, simulation resume, narrative stacking, comparative runs) Made-with: Cursor
onMounted only checked for runner_status='running'. When a completed simulation was visited, it tried to restart it (got 400 rejected) and showed a blank "WAITING FOR AGENT ACTIONS" state. Now handles completed and stopped states by loading results directly (phase 2). Made-with: Cursor
When visiting a completed simulation, the action feed showed "WAITING FOR AGENT ACTIONS" because fetchRunStatusDetail() was never called. Now fetches detail data once on load so the action timeline populates. Made-with: Cursor
_monitor_orphaned_simulation started reading action logs from position 0 on every backend restart, re-feeding all actions to GraphMemoryUpdater and creating duplicate episodes. Now starts from end of existing log files so only new actions are processed. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
While self-hosting MiroFish-Offline on a 4x GPU Linux server, we discovered that refreshing the browser kills the running simulation and permanently deletes all data. This PR fixes the root cause (two compounding bugs) plus related issues found during debugging, adds a dashboard view to monitor simulations, and fixes orphaned simulation recovery on backend restart.
The kill chain (before this PR)
Step3Simulation.vueonMounted()callsdoStartSimulation()unconditionally on every page loaddoStartSimulation()hasforce: truehardcoded, telling the backend to kill any running processBug fixes
onMountednow callsgetRunStatus()first; if the simulation is already running, it resumes status polling instead of restarting (Step3Simulation.vue)force: truetoforce: falsein the default start path (Step3Simulation.vue)/startendpoint was callingSimulationRunner.start_simulation()without thestorageparameter, soGraphMemoryUpdaterthrew "Must provide storage" and graph updates silently failed. Now fetchesneo4j_storagefrom Flask context and passes it through (simulation.py)/startreturns 400 for an already-running simulation,state.jsonis now saved withstatus: "running"before returning the error, preventing a desync where the frontend sees "ready" and retries in a loop (simulation.py)cleanup_simulation_logs()now accepts optionalstorageandgraph_idparameters to clear stale graph data during force-restart (simulation_runner.py)run_state.jsondies while the simulation subprocess keeps running. On startup,SimulationRunner.reconnect_orphaned_simulations()now scans for simulations withrunner_status="running", checks if the PID is still alive, and starts a new monitor thread to resume reading action logs and updating state. Dead processes are marked as stopped. (simulation_runner.py,__init__.py)GraphMemoryUpdater, so graph updates stopped after any backend restart. Now readsgraph_idfromstate.json, creates a freshNeo4jStorageconnection, and restarts the updater so simulation actions continue flowing into the knowledge graph. (simulation_runner.py)simulation_runner.py)Additional fixes (found during debugging)
llm_client.py)graph.py)namekey before validation (ontology_generator.py)New feature: Simulation Dashboard (
/dashboard)Built to verify that the bug fixes were working — turned out to be a useful feature:
Test plan
/startwithforce: falseon a running simulation returns 400 without killing it