This project implements a simulated spider with a modular brain, explicit predator pressure, standardized neural interfaces, online learning, deterministic behavioral scenarios, and reproducible evaluation workflows.
This repository is an experimental research codebase. It is appropriate for research iteration, benchmark packaging, and publication-facing result bundles, but it should not be read as a claim of production stability or a stable public API.
In this repo, "publication-grade" means benchmark rigor and reproducibility for benchmark-of-record outputs. It does not mean long-term interface stability.
The benchmark-of-record artifacts are the benchmark package plus the run summary
and exported behavior rows used to build it. The package-of-record includes
benchmark_manifest.json, resolved_config.json, pip_freeze.txt,
seed_level_rows.csv, aggregate and claim tables, plots, and limitations.txt.
See RELEASE.md for the checklist and release posture details, CHANGELOG.md for reproducibility-relevant changes, and CITATION.cff for citation guidance.
If you want the shortest trustworthy first run, use the smoke budget from the repository root:
Python 3.10+ is required for the full simulator and CLI. The command examples
use PYTHON_BIN so the interpreter is selected in one place:
export PYTHON_BIN="${PYTHON_BIN:-python3.10}"
$PYTHON_BIN --versionIf that interpreter is unavailable, install Python 3.10+ or set PYTHON_BIN to
your available versioned interpreter, such as python3.11.
The README command examples assume you run them from the repository root. From
there, Python can import spider_cortex_sim without a PYTHONPATH=. prefix. If
you run commands from another directory, set PYTHONPATH to the repository root
first.
$PYTHON_BIN -m spider_cortex_sim --budget-profile smoke --summary spider_summary.jsonThis runs a short training-plus-evaluation cycle and writes
./spider_summary.json. The smoke budget is intended to finish in under a
minute on a local development machine.
For a quick success check, open spider_summary.json and look at:
config.budgetfor the resolved budget profile, benchmark strength, and seedtraining_last_window.survival_ratefor survival over the final training windowevaluation.mean_rewardfor average reward during evaluationevaluation.survival_ratefor post-training survival performanceparameter_normsfor the final parameter magnitudes of each module
Add --trace spider_trace.jsonl if you want a per-tick evaluation trace for
deeper inspection.
- Just run the simulator: start with Quick Start or add
--render-evalfor an ASCII replay - Run with the GUI: add
--gui; see Graphical Interface (Pygame) - Inspect behavior scenarios or the behavior suite: see Behavioral Evaluation and docs/behavior_gate_diagnostics.md
- Run claim tests: see Claim Test Suite
- Run ablations or learning-evidence workflows: see Ablations And Learning Evidence and docs/ablation_workflow.md
- Diagnose progressive modularity before adding new centers: see docs/architectural_ladder.md
- Generate an offline analysis bundle:
$PYTHON_BIN -m spider_cortex_sim.offline_analysis --summary spider_summary.json --output-dir ./report/; see Offline Analysis
- Usage or interpretation questions: open a GitHub issue with a
[Question]prefix; see SUPPORT.md. - Bugs: use the bug report issue template and include reproduction steps, environment details, and relevant output.
- Feature requests or research ideas: use the feature request / research idea issue template and explain the motivation and validation path.
- Security-sensitive findings: do not open a public issue; report privately
through GitHub Security or contact
ThalesMMSthrough GitHub. See SECURITY.md. - Contributions: read CONTRIBUTING.md before opening a pull request.
The current version centers on three structural changes:
- explicit predator instances inside the world, with a backward-compatible primary
lizard - a primitive locomotion output space instead of high-level semantic actions
- standardized input and output interfaces for each center or cortex to reduce excessive coupling between networks
That core has since been extended with:
- reward profiles (
classic,ecological,austere) - explicit shelter geometry with entrance, interior, and deep zones
- predator profiles for visual and olfactory hunters, including multi-predator worlds
- a richer lizard state machine (
PATROL,ORIENT,INVESTIGATE,CHASE,WAIT,RECOVER) - lizard working memory for investigate targets, ambush windows, chase streaks, and recovery
- explicit spider memory for food, predator, safe shelter, and escape route
- optional recurrent memory inside selected proposer modules
- map templates and deterministic scenarios for behavioral evaluation
- behavioral scorecards, ablations, learning-evidence workflows, reward audits, and offline analysis
The system still uses small NumPy neural networks, online learning, and independent modules, but the current architecture is closer to a simplified organism under ecological constraints.
Probabilistic nighttime danger was replaced by a concrete predator in the environment.
The lizard:
- occupies a real grid cell
- has limited vision
- moves more slowly than the spider
- cannot enter the shelter
- causes pain and health loss on contact
- can be escaped if the spider keeps moving effectively
The motor system no longer chooses semantic actions such as MOVE_TO_FOOD or MOVE_TO_SHELTER.
The action space is now:
MOVE_UPMOVE_DOWNMOVE_LEFTMOVE_RIGHTSTAYORIENT_UPORIENT_DOWNORIENT_LEFTORIENT_RIGHT
ORIENT_* actions are active-sensing turns: they rotate the heading and refresh perception for the current tick, but they do not translate the spider.
That keeps the simulator closer to primitive locomotion and active-sensing control instead of scripted behavior verbs.
Because the motor output is now limited to primitive movement plus active-sensing orientation, feeding and rest emerge from spatial context:
- when the spider reaches food, feeding happens automatically
- when the spider reaches shelter under fatigue or night pressure, recovery happens automatically
- staying still can still help feeding or rest, but it is no longer a rigid prerequisite
The brain now contains five proposer modules plus action_center and motor_cortex:
visual_cortexsensory_cortexhunger_centersleep_centeralert_centeraction_centermotor_cortex
The first five networks propose locomotion in the same output space. action_center applies valence-based priority gating before final arbitration, and motor_cortex acts as the final locomotor executor or corrector.
The current architecture also includes:
- fixed, named interfaces per module
- module dropout during training
- local reflexes per module, defined from the module's own interface
- auxiliary per-module targets ("reflex targets") to reduce coadaptation
- optional recurrent proposer state for selected modules through
BrainAblationConfig.recurrent_modules - explicit contribution metrics in
action_centerfor dominance, agreement, and effective proposer count - the
local_credit_onlyablation, which preserves current inference but removes global policy-gradient broadcast during modular training
Recurrent memory is opt-in per proposer. Set BrainAblationConfig(recurrent_modules=(...)) to make selected modules stateful within an episode, or use the canonical modular_recurrent and modular_recurrent_all ablation variants when comparing recurrent and feed-forward architectures. Hidden state resets at episode boundaries, so recurrence helps with within-episode temporal context rather than cross-episode carryover. Evaluation guidance and variant definitions live in docs/ablation_workflow.md.
Layer responsibilities:
spider_cortex_sim/world.py: ecological dynamics, body state, and management of explicit observable memoryspider_cortex_sim/interfaces.py: named contracts between world and brainspider_cortex_sim/modules.py: proposer networks onlyspider_cortex_sim/agent.py: local reflexes, auxiliary targets, valence gating, and final motor correction
The generated contract documentation lives in docs/interfaces.md. The short topology note for the newer arbitration chain lives in docs/action_center_design.md.
For systematic validation of modularity, see docs/architectural_ladder.md.
The ladder complements the ablation workflow: use docs/architectural_ladder.md as the diagnostic framework for deciding where a regression likely comes from, and use docs/ablation_workflow.md for the detailed variant definitions, competence-gap and shaping-gap reads, and benchmark-of-record comparison procedures.
The parallel B-series diagnostic strategy is documented in docs/b_series_strategy.md. It separates B0 legacy reproduction of the old semantic-action benchmark from B0 current bridge, which keeps the current primitive action space while restoring an internal semantic decision layer. B0 current bridge now uses a deliberately simple legacy-direct controller: the neural head still emits semantic logits for audit, but a trace-visible semantic controller selects the legacy intention before primitive execution. B1 and later B levels use transfer learning from the previous B checkpoint by default.
That policy is intentionally blocking at A5: no finer biological module is an automatic improvement, and no new center should be added until A2 learns, A3 preserves it, A4 does not collapse relative to the coarse control, local interface tasks pass, credit remains interpretable, and the candidate comes with an explicit hypothesis plus a selective ablation test.
Readers considering any new biological center should start with docs/architectural_ladder.md, not with interface expansion alone. That document is the source of truth for the prerequisite checklist, module admission criteria, and reusable proposal checklist, and it explicitly treats every proposed module as a hypothesis to test rather than an automatic improvement.
The 2D grid world contains:
- shelter or burrow (
H) - shelter spatial roles (
entrance,inside,deep) - walls or blocked cells (
#) - clutter terrain (
:) - food (
F) - spider (
A) - predator lizard or hunter instances (
L) - day/night cycle
- limited vision
- food and predator smell fields
- homeostatic pressures from hunger, fatigue, health, pain, and contact
If the spider and lizard occupy the same ASCII-rendered cell, the renderer shows X.
Predators are configured with PredatorProfile values in spider_cortex_sim/predator.py. A profile defines the predator name, visual range, smell range, detection style, move interval, and detection threshold.
Built-in profiles:
DEFAULT_LIZARD_PROFILE: backward-compatible single-predator behavior matching the classic lizard parametersVISUAL_HUNTER_PROFILE: long visual range, short smell range,detection_style="visual"OLFACTORY_HUNTER_PROFILE: short visual range, long smell range,detection_style="olfactory"
SpiderWorld.reset() accepts predator_profiles=[...]. Passing one profile preserves the old single-predator API; passing several profiles spawns one predator per profile and creates one controller per predator. The compatibility helpers still work:
world.lizardandworld.lizard_pos()refer to the first predatorworld.predators,world.predator_positions(),world.predator_count, andworld.get_predator(index)expose the full predator set
Scenario setup code can also assign explicit LizardState(profile=...) instances when a benchmark needs hand-placed predators.
.
├── README.md
├── docs/
│ ├── ablation_workflow.md
│ ├── action_center_design.md
│ ├── architectural_ladder.md
│ └── interfaces.md
├── spider_cortex_sim/
│ ├── __main__.py
│ ├── agent.py
│ ├── bus.py
│ ├── cli.py
│ ├── gui.py
│ ├── interfaces.py
│ ├── modules.py
│ ├── nn.py
│ ├── predator.py # PredatorProfile, DEFAULT_LIZARD_PROFILE, VISUAL_HUNTER_PROFILE, OLFACTORY_HUNTER_PROFILE
│ ├── simulation.py
│ └── world.py
└── tests/
Create and activate a virtual environment:
export PYTHON_BIN="${PYTHON_BIN:-python3.10}"
$PYTHON_BIN --version
$PYTHON_BIN -m venv venv
source venv/bin/activateOn Windows, use the Python launcher:
py -3.10 --version
py -3.10 -m venv venv
venv\Scripts\activateInstall dependencies:
pip install -r requirements.txtNotes:
- Python 3.10+ is required for the simulator and CLI entrypoints
- If your supported interpreter is newer, substitute its versioned command
such as
python3.11 numpyis requiredpygame-ceis optional and only needed for the graphical interface (--gui)
Budget profiles are the recommended way to run reproducible experiments. Start with the smoke profile for a first local run:
$PYTHON_BIN -m spider_cortex_sim --budget-profile smokeSave a summary and trace:
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile smoke \
--summary spider_summary.json \
--trace spider_trace.jsonlRender the final evaluation episode in ASCII:
$PYTHON_BIN -m spider_cortex_sim --budget-profile smoke --render-evalMove to the dev budget when you want a slightly stronger reproducible local
benchmark:
$PYTHON_BIN -m spider_cortex_sim --budget-profile dev --summary spider_summary.jsonUse explicit episode and step flags only when you want a custom non-profiled run:
$PYTHON_BIN -m spider_cortex_sim \
--episodes 120 \
--eval-episodes 3 \
--max-steps 90Train with the ecological profile and an alternate map:
$PYTHON_BIN -m spider_cortex_sim \
--episodes 120 \
--eval-episodes 3 \
--max-steps 90 \
--reward-profile ecological \
--map-template side_burrowAvailable map templates:
central_burrowside_burrowcorridor_escapetwo_sheltersexposed_feeding_groundentrance_funnel
Map notes:
corridor_escapeusesNARROWterrain, representing a constrained passagetwo_sheltersintroduces two deep shelters competing with a central transition zoneexposed_feeding_groundconcentrates food in open terrain with more exposed approach pathsentrance_funnelcompresses the shelter entrance through a bottleneck that favors blocking and ambush
Reward profiles:
classic: more guided, best for quick training and baseline runsecological: less direct shaping and more pressure from world dynamicsaustere: minimal-progress baseline for shaping audits and contrast
The shaping-reduction program treats dense rewards as scaffolding unless they
survive an explicit audit. The working philosophy is simple: trust a behavior
more when it still appears under austere, where direct progress rewards,
shelter-entry bonuses, predator-escape bonuses, and day-exploration guidance
are removed or reduced. classic and ecological remain useful training and
comparison profiles, but claims should increasingly rest on behavior that
survives the austere profile.
The reduction roadmap in spider_cortex_sim/reward/shaping.py tracks the next
reward terms to defend, weaken, or investigate. The
SHAPING_REDUCTION_ROADMAP definition lives in that shaping.py submodule
inside the spider_cortex_sim/reward/ package:
resting: high priority, kept weakened while rest outcome evidence is separated from configurable rest bonuses.sleep_debt_pressure: high priority, defended only while it behaves like physiological pressure rather than indirect shelter guidance.night_exposure,hunger_pressure, andfatigue_pressure: medium priority, currently defended as ecological or physiological costs, with bounded gap requirements.homeostasis_penaltyandterrain_cost: medium priority, under investigation because they mix legitimate ecological pressure with possible hidden steering.action_cost: low priority, defended as a small universal energy cost.
Gap policy defines when dense profiles are too far ahead of austere. The hard limits are:
classic_minus_austerescenario success delta <=0.20ecological_minus_austerescenario success delta <=0.15classic_minus_austeremean reward gap <=0.50ecological_minus_austeremean reward gap <=0.40- austere survival rate >=
0.50
Warnings fire before gates. For upper-bound gaps, the warning threshold is
0.80 * hard_limit. For austere survival, the warning band starts below
0.50 / 0.80 = 0.625 and becomes a gate violation below 0.50.
Evidence criteria classify reward terms as follows:
removedor removable: austere survival reaches the threshold on all relevant scenarios and claim tests do not regress.weakened: austere survival reaches the threshold on a majority of relevant scenarios, while remaining classic/ecological gaps stay bounded.defended: the term has documented physiological or ecological necessity, not just behavioral convenience.under_investigation: more scenario evidence is needed before the term can be defended, weakened, or removed.outcome_signal: sparse outcome anchors such as feeding, predator contact, and death are tracked separately from dense progress shaping.
Scenario requirements decide how austere survival affects reports:
- Gate scenarios:
night_rest,predator_edge,entrance_ambush,shelter_blockade,two_shelter_tradeoff,visual_olfactory_pincer,olfactory_ambush, andvisual_hunter_open_field. - Warning scenarios:
recover_after_failed_chase,food_vs_predator_conflict, andsleep_vs_exploration_conflict. - Diagnostic scenarios:
open_field_foraging,corridor_gauntlet,exposed_day_foraging, andfood_deprivation.
Primary claim tests now depend on austere survival. If a primary claim has
austere_survival_required=True, every gate scenario in that claim must pass
the austere survival threshold before the claim can pass. This applies to
learning_without_privileged_signals, escape_without_reflex_support, and
specialization_emerges_with_multiple_predators.
When a benchmark summary includes austere comparison data, read these fields first:
{
"austere_survival_summary": {
"overall_survival_rate": 0.75,
"gate_pass_count": 6,
"gate_fail_count": 2,
"warning_scenarios": [],
"gap_policy_violations": []
},
"austere_survival_gate_passed": false,
"shaping_dependent_behaviors": [
{
"scenario": "night_rest",
"profile": "classic",
"success_rate_delta": 0.35,
"limit": 0.20
}
]
}In this example, the overall austere rate is informative but not sufficient:
two gate scenarios failed, so primary claims that rely on those gates cannot be
treated as valid. The shaping_dependent_behaviors entry says classic
outperformed austere on night_rest beyond the policy limit, so the roadmap
should treat the relevant rest or sleep terms as reduction risks.
Run a single scenario:
$PYTHON_BIN -m spider_cortex_sim \
--episodes 0 \
--eval-episodes 0 \
--scenario night_restRun the full scenario suite:
$PYTHON_BIN -m spider_cortex_sim \
--episodes 0 \
--eval-episodes 0 \
--scenario-suiteAvailable scenarios:
night_restpredator_edgeentrance_ambushopen_field_foragingshelter_blockaderecover_after_failed_chasecorridor_gauntlettwo_shelter_tradeoffexposed_day_foragingfood_deprivationvisual_olfactory_pincerolfactory_ambushvisual_hunter_open_fieldfood_vs_predator_conflictsleep_vs_exploration_conflict
Scenario-to-map specializations include:
entrance_ambush,shelter_blockade, andrecover_after_failed_chaseuseentrance_funnelopen_field_foragingandexposed_day_foraginguseexposed_feeding_groundvisual_olfactory_pincerandvisual_hunter_open_fielduseexposed_feeding_groundolfactory_ambushusesentrance_funnelcorridor_gauntletusescorridor_escapetwo_shelter_tradeoffusestwo_shelters
Multi-predator scenario intent:
visual_olfactory_pincer: the spider starts between a visible visual hunter in front and an olfactory hunter behind and downwind; it tests dual-threat perception and module specializationolfactory_ambush: an olfactory hunter waits near a shelter entrance where the spider can smell danger without seeing it; it targets sensory-cortex-led responsevisual_hunter_open_field: a fast visual hunter pressures the spider in open terrain; it targets visual-cortex-led response under exposed conditions
Run the full behavioral suite with explicit scorecards:
$PYTHON_BIN -m spider_cortex_sim \
--episodes 0 \
--eval-episodes 0 \
--behavior-suite \
--full-summaryRun one behavioral scenario and export flat CSV:
$PYTHON_BIN -m spider_cortex_sim \
--episodes 0 \
--eval-episodes 0 \
--behavior-scenario night_rest \
--behavior-csv spider_behavior.csv \
--full-summarysummary["behavior_evaluation"] includes:
suite: aggregated scenario scorecards withsuccess_rate,checks,behavior_metrics,diagnostics, andfailuressummary: overall suite success and detected regressionscomparisons: optional profile/map/seed comparison matrices when behavioral comparison flags are usedlearning_evidence: optional comparison between a trained checkpoint and controls such asrandom_init,reflex_only,freeze_half_budget, andtrained_long_budgetclaim_tests: optional experiment-of-record synthesis that composes the canonical learning-evidence, ablation, and noise-robustness primitives into per-claim pass/fail results
Weak-signal scenarios now publish scenario-owned interpretation metadata:
diagnostic_focussuccess_interpretationfailure_interpretationbudget_note
The scenario diagnostics block also summarizes:
primary_outcomeoutcome_distributionpartial_progress_ratedied_without_contact_rate
Behavioral scenarios are split between emergence gates and capability probes.
Emergence gates support claim tests: they ask whether learned behavior remains
present without privileged support, improves with memory, survives noise, or
specializes by predator type. Capability probes map narrower behavioral
boundaries inside the full benchmark. They can score 0.00 while still
producing interpretable evidence through failure_mode, progress_band, and
outcome_band.
The explicit capability probes are:
| Scenario | Target skill | Acceptable partial progress |
|---|---|---|
open_field_foraging |
food_vector_acquisition_exposed |
Any positive food_distance_delta or left_shelter with food signal present indicates food-vector acquisition capability even if foraging_viable fails. |
corridor_gauntlet |
corridor_navigation_under_threat |
Any positive food_distance_delta without contact indicates corridor navigation capability; survived_no_progress indicates shelter-exit failure distinct from navigation failure. |
exposed_day_foraging |
daytime_foraging_under_patrol |
Any positive food_distance_delta indicates foraging capability even under threat; cautious_inert indicates arbitration chose safety over food. |
food_deprivation |
hunger_driven_commitment |
commits_to_foraging=True with approaches_food=True indicates commitment capability even if timing_failure prevents full success. |
These four scenarios remain in the full behavioral benchmark with
benchmark_tier="capability" and is_capability_probe=True. They are excluded
from claim tests because their role is to expose capability boundaries and
calibration outcomes, not to serve as pass/fail evidence for the repository's
emergence hypotheses.
The scientific question in this repository is narrower than "does the score go up?" The emergence hypothesis is that the modular cortex learns reusable threat-sensitive behavior that remains present when privileged supports are removed, stays coherent under disturbance, and differentiates between predator types rather than collapsing into one generic escape reflex.
The supporting workflows still matter, but they are no longer the experiment-of-record by themselves:
- ablations isolate which modules and memory pathways matter
- learning-evidence comparisons separate trained behavior from initialization or reflex-only baselines
- the noise matrix checks whether behavior survives train/eval mismatch
Those workflows provide the raw evidence. The claim test suite is the formal gate that reads them together and decides whether the core scientific claims actually hold.
Run the canonical claim suite and write the full experiment record to JSON:
$PYTHON_BIN -m spider_cortex_sim \
--claim-test-suite \
--summary results.jsonThe five canonical claim tests are:
-
learning_without_privileged_signalsHypothesis: trained behavior still beats an untrained policy after privileged reflex support is removed. Protocol: learning-evidence comparison fromrandom_inittotrained_without_reflex_supportacrossnight_rest,predator_edge,entrance_ambush,shelter_blockade, andtwo_shelter_tradeoff, with the leakage audit enforced. Success criterion:trained_without_reflex_supportmust improvescenario_success_rateoverrandom_initby at least0.15and the leakage audit must report zero unresolved privileged-signal findings. -
escape_without_reflex_supportHypothesis: predator escape remains learned behavior rather than a reflex-only artifact. Protocol: learning-evidence comparison fromreflex_onlytotrained_without_reflex_supportoverpredator_edge,entrance_ambush, andshelter_blockade. Success criterion:trained_without_reflex_supportmust reach predator-responsescenario_success_rate >= 0.60and exceedreflex_onlyby at least0.10. -
memory_improves_shelter_returnHypothesis: recurrent memory improves delayed shelter return and shelter trade-off behavior. Protocol: ablation comparison ofmodular_recurrentversusmodular_fullonnight_restandtwo_shelter_tradeoff. Success criterion:modular_recurrentmust improve shelter-returnscenario_success_rateby at least0.10. -
noise_preserves_threat_valenceHypothesis: threat-sensitive arbitration survives train/eval noise mismatch instead of working only on the diagonal. Protocol: canonical noise-robustness matrix, comparing diagonal and off-diagonal aggregate scores over the threat-response scenarios. Success criterion: the off-diagonal threat-response score must stay at least0.60, and the diagonal-minus-off-diagonal gap must stay at most0.15. -
specialization_emerges_with_multiple_predatorsHypothesis: multiple predator ecologies produce predator-type specialization instead of one undifferentiated threat pathway. Protocol: predator-type ablation comparison acrossvisual_olfactory_pincer,olfactory_ambush, andvisual_hunter_open_field, paired with type-specific cortex engagement checks in the full modular policy. Success criterion:drop_visual_cortexmust drivevisual_minus_olfactory_success_rate <= -0.10,drop_sensory_cortexmust drivevisual_minus_olfactory_success_rate >= 0.10, and the reference policy must show the expected cortex engagement in at least2of the3specialization scenarios.
Generic benchmarks such as the ablation suite and the noise matrix still provide the supporting detail, but the claim tests are the scientific nucleus: they are the concise pass/fail record for whether the main emergence story survives contact with the actual measurements.
$PYTHON_BIN -m spider_cortex_sim \
--gui \
--episodes 120 \
--eval-episodes 3 \
--max-steps 90 \
--reward-profile classic \
--map-template central_burrowThe GUI window is resizable. When you resize it, the grid automatically zooms to fit the available space. The model/architecture controls stay docked in the left sidebar, and the ecological diagnostics stay docked in the right-side panel.
To adjust scaling behavior, see the GUI controller logic (spider_cortex_sim/gui/controller.py), including the min/max cell size clamps and the ui_scale used for font sizing.
The GUI displays:
- a left model selector for
modular_full, A0, B0 current bridge, and B0 legacy - visible architecture metadata including runtime, action space, B level, B mode, seed, map, and reward profile
- B-series diagnostics such as learned semantic action, selected semantic action, B0-current source/reason, bridge primitive action, bridge reason, and external override count
- an Evolution save action that writes audit snapshots under
artifacts/gui_evolution_snapshots/ - the predator lizard on the grid
- shelter roles and terrain types
- contact, sighting, and escape counters
- lizard position, mode, and current target
- explicit spider memory and sleep debt
- recent reward components
- nighttime shelter-role distribution and predator-state occupancy
B0 legacy semantic is shown as an isolated benchmark mode. It does not use the current SpiderWorld, does not add semantic actions to the current public action space, and uses a simplified legacy panel/grid so it is not confused with current-world ecological success. B0 current bridge runs in the current world as a simple semantic bridge and still submits only primitive actions to world.step().
Useful shortcuts:
V: toggle visibility overlayM: toggle smell heatmap
Train and save:
$PYTHON_BIN -m spider_cortex_sim \
--episodes 120 --eval-episodes 3 --max-steps 90 \
--save-brain spider_brainLoad and continue training:
$PYTHON_BIN -m spider_cortex_sim \
--episodes 60 --eval-episodes 3 --max-steps 90 \
--load-brain spider_brain \
--save-brain spider_brainLoad only selected modules:
$PYTHON_BIN -m spider_cortex_sim \
--episodes 60 --eval-episodes 3 --max-steps 90 \
--load-brain spider_brain \
--load-modules visual_cortex hunger_centerCompatibility notes:
- the current architecture uses an explicit interface signature
- older saves predating the current interface standardization are rejected with explicit incompatibility errors
- older checkpoints predating
sleep_phase,rest_streak,sleep_debt, shelter-role signals, certainty/occlusion signals, explicit memory, or oriented perception are also incompatible with the current architecture - the versioned interface registry and generated contract docs are in docs/interfaces.md
Because interface descriptions are part of the current fingerprinted metadata, this English translation pass also changes interface and architecture fingerprints. Older checkpoints may therefore fail compatibility checks even though behavior and identifiers were not intentionally refactored.
Run the full suite:
$PYTHON_BIN -m unittest discover -s tests -vThe tests cover:
- standardized interface shapes and generated interface docs
- contextual feeding and rest
- auditable reward decomposition
- sleep progression
SETTLING -> RESTING -> DEEP_SLEEP, sleep debt, and interruptions - predator contact with real damage
- shelter geometry and occlusion
- explicit spider memory
- map templates and reachability
- deterministic scenario regressions
- memory-guided escape and foraging after loss of sight
- scenario runners and deterministic predator-response latency
- online parameter updates
- lightweight trainability checks across reward profiles, comparisons, and alternate maps
The summary and trace include:
- per-step
reward_components reward_audit, including component inventory, shaping categories, and leakage candidates- nighttime shelter occupancy and nighttime stillness
- nighttime shelter-role distribution (
outside,entrance,inside,deep) - predator-response latency
- predator contacts, escapes, and response latency by predator type
- dominant module response by predator type
reward_profileandmap_templateconfig.operational_profile, including active thresholds and operational weightsconfig.budget, including resolved profile, benchmark strength, seeds, and explicit overridescheckpointingwhen--checkpoint-selection bestis used- explicit certainty and occlusion fields per visual channel
- world-layer maintained
headingand decayed percept traces in trace metadata - explicit
predator_motion_salience - normalized memory vectors in trace metadata
- predator occupancy by state (
PATROL,ORIENT,INVESTIGATE,CHASE,WAIT,RECOVER) - predator state transitions and dominant predator state per episode or scenario
- food and shelter distance deltas
event_logstages per tick- behavioral scorecards per scenario in
behavior_evaluation - diagnostic per-episode bands such as
progress_bandandoutcome_band - explicit
action_centerarbitration outputs such aswinning_valence,valence_scores,module_gates,suppressed_modules, andevidence
When --debug-trace is combined with --trace, each tick also includes:
- serialized observations before and after transition
- reward components
- normalized memory vectors
- per-module logits before reflex, reflex delta, and post-reflex logits
- logits after valence gating, per-module
gate_weight, anddebug.arbitration effective_reflex_scale,module_reflex_override,module_reflex_dominance, andfinal_reflex_override- full predator internal state
Use --full-summary to print the complete JSON summary to stdout.
Specialization metrics compare how modules respond when visual versus olfactory predators are the primary threat. A high predator-type specialization score means response distributions differ by predator type, such as stronger visual_cortex dominance for visual hunters and stronger sensory_cortex dominance for olfactory hunters. A low score means the same modules respond similarly to both predator types, which can be useful as a baseline but does not show sensory-niche specialization.
The CLI exposes explicit budget profiles:
smoke: quick sanity or CI profile (6episodes,1evaluation run,60steps, seed7)dev: short reproducible local benchmark (12episodes,2evaluation runs,90steps, seeds7/17/29)report: stronger reporting workflow (24episodes,4evaluation runs,120steps,2repetitions per scenario, seeds7/17/29/41/53)paper: publication-grade benchmark-of-record workflow; requires--checkpoint-selection bestand records the resolved seed and checkpoint budget in the summary
Canonical commands:
# smoke: sanity / CI
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile smoke \
--behavior-suite --full-summary
# dev: fast local benchmark
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile dev \
--ablation-suite --full-summary
# report: stronger benchmark + automatic checkpoint selection
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile report \
--checkpoint-selection best \
--ablation-suite --full-summaryWithout --budget-profile, the run still works in custom mode and records the effective values and overrides in summary["config"]["budget"].
Use the paper budget with best-checkpoint selection for publication-facing architecture claims. Add --benchmark-package to write a reproducible package containing the manifest, resolved configuration, seed-level rows, uncertainty-aware aggregate tables, claim-test tables, effect-size tables, reports, plots, supporting CSVs, and limitations.
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile paper \
--checkpoint-selection best \
--ablation-suite \
--summary spider_architecture_paper_summary.json \
--behavior-csv spider_architecture_paper_rows.csv \
--benchmark-package spider_architecture_paper_package \
--full-summaryThe package manifest records file hashes, seed count, confidence level, resolved budget metadata, checkpoint-selection metadata, and an environment block with git commit/tag/dirty-state plus Python version and platform. pip_freeze.txt preserves the dependency snapshot as a hashed package artifact. The CLI rejects --benchmark-package unless both --budget-profile paper and --checkpoint-selection best are present.
Uncertainty reporting is seed-level. Confidence intervals are percentile bootstrap intervals over seed-level metric values and default to 95%. Claim-test pass/fail logic remains based on point estimates; the package adds reference_uncertainty, comparison_uncertainty, delta_uncertainty, and effect_size_uncertainty for reporting. Effect-size tables report Cohen's d with negligible, small, medium, and large magnitude labels.
Compare reward profiles on the current map:
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile dev \
--compare-profiles --full-summaryCompare maps under the current reward profile:
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile dev \
--reward-profile ecological \
--compare-maps --full-summaryCompare the behavioral suite across profiles:
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile dev \
--behavior-compare-profiles --full-summaryCompare the behavioral suite across maps and export CSV:
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile dev \
--reward-profile ecological \
--behavior-compare-maps \
--behavior-csv spider_behavior_compare.csv \
--full-summaryThe shaping audit uses austere as the minimal baseline and records deltas against it under summary["behavior_evaluation"]["shaping_audit"].
The project includes a separate runner that transforms summary.json, trace.jsonl, and behavior_csv into an offline analysis bundle:
$PYTHON_BIN -m spider_cortex_sim.offline_analysis \
--summary spider_summary_compare.json \
--trace spider_trace_debug.jsonl \
--behavior-csv spider_behavior_compare.csv \
--output-dir offline_analysisRules:
--output-diris required- at least one of
--summary,--trace, or--behavior-csvis required - the report is always emitted, even with partial input
- missing blocks are reported in
report.mdandreport.jsoninstead of aborting execution
The offline analysis output directory also includes a navigation index at INDEX.md.
- Open
INDEX.mdfirst to jump to scenario summaries, gate results, claims, and evidence. - All links are relative, so you can move/copy the report folder and still browse it locally.
Compare the modular reference against the canonical ablation suite:
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile dev \
--ablation-suite \
--behavior-csv spider_ablation_rows.csv \
--full-summaryRun the learning_evidence suite under the smoke budget:
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile smoke \
--learning-evidence \
--behavior-scenario night_rest \
--behavior-csv spider_learning_evidence_rows.csv \
--full-summaryRun the canonical short-vs-long learning-evidence comparison:
$PYTHON_BIN -m spider_cortex_sim \
--budget-profile smoke \
--learning-evidence \
--learning-evidence-long-budget-profile report \
--behavior-suite \
--summary spider_learning_evidence_summary.json \
--behavior-csv spider_learning_evidence_rows.csv \
--full-summaryThe detailed ablation workflow, variant definitions, and canonical check-in table live in docs/ablation_workflow.md.
- The system is biologically inspired, not biologically faithful
- The "return vector to shelter" is a simplified form of proprioception or minimal spatial memory
- Explicit memory is perception-grounded: data sources are limited to local visual perception, contact events, and movement history. The environment pipeline maintains mechanics such as aging and TTL expiration, but it cannot inject information the spider has not perceived.
- Local per-module reflexes act like innate behavior that online learning later refines
- There is no giant fallback center that integrates everything; each proposer receives only its own interface and emits only standardized locomotion proposals
- implemented: multiple predators with different sensory niches
- implemented: a more strongly oriented field of view with active sensing
- separate locomotion into gait, speed, and body orientation
- migrate the networks to PyTorch while preserving the same modular interface signature
Active sensing now uses a tightened 45 degree foveal cone and 70 degree peripheral cone. ORIENT_* actions refresh current-tick perception immediately after the heading change, and scan recency is tracked so observations can distinguish fresh inspection from stale or never-scanned headings.
