Skip to content

ThalesMMS/BIONG

Repository files navigation

Biologically-Inspired Organism, Not a Game

Screenshot

This project implements a simulated spider with a modular brain, explicit predator pressure, standardized neural interfaces, online learning, deterministic behavioral scenarios, and reproducible evaluation workflows.

Release Posture

This repository is an experimental research codebase. It is appropriate for research iteration, benchmark packaging, and publication-facing result bundles, but it should not be read as a claim of production stability or a stable public API.

In this repo, "publication-grade" means benchmark rigor and reproducibility for benchmark-of-record outputs. It does not mean long-term interface stability.

The benchmark-of-record artifacts are the benchmark package plus the run summary and exported behavior rows used to build it. The package-of-record includes benchmark_manifest.json, resolved_config.json, pip_freeze.txt, seed_level_rows.csv, aggregate and claim tables, plots, and limitations.txt.

See RELEASE.md for the checklist and release posture details, CHANGELOG.md for reproducibility-relevant changes, and CITATION.cff for citation guidance.

Start Here

If you want the shortest trustworthy first run, use the smoke budget from the repository root:

Python 3.10+ is required for the full simulator and CLI. The command examples use PYTHON_BIN so the interpreter is selected in one place:

export PYTHON_BIN="${PYTHON_BIN:-python3.10}"
$PYTHON_BIN --version

If that interpreter is unavailable, install Python 3.10+ or set PYTHON_BIN to your available versioned interpreter, such as python3.11.

The README command examples assume you run them from the repository root. From there, Python can import spider_cortex_sim without a PYTHONPATH=. prefix. If you run commands from another directory, set PYTHONPATH to the repository root first.

$PYTHON_BIN -m spider_cortex_sim --budget-profile smoke --summary spider_summary.json

This runs a short training-plus-evaluation cycle and writes ./spider_summary.json. The smoke budget is intended to finish in under a minute on a local development machine.

For a quick success check, open spider_summary.json and look at:

  • config.budget for the resolved budget profile, benchmark strength, and seed
  • training_last_window.survival_rate for survival over the final training window
  • evaluation.mean_reward for average reward during evaluation
  • evaluation.survival_rate for post-training survival performance
  • parameter_norms for the final parameter magnitudes of each module

Add --trace spider_trace.jsonl if you want a per-tick evaluation trace for deeper inspection.

Choose Your Path

Support And Reporting

  • Usage or interpretation questions: open a GitHub issue with a [Question] prefix; see SUPPORT.md.
  • Bugs: use the bug report issue template and include reproduction steps, environment details, and relevant output.
  • Feature requests or research ideas: use the feature request / research idea issue template and explain the motivation and validation path.
  • Security-sensitive findings: do not open a public issue; report privately through GitHub Security or contact ThalesMMS through GitHub. See SECURITY.md.
  • Contributions: read CONTRIBUTING.md before opening a pull request.

The current version centers on three structural changes:

  1. explicit predator instances inside the world, with a backward-compatible primary lizard
  2. a primitive locomotion output space instead of high-level semantic actions
  3. standardized input and output interfaces for each center or cortex to reduce excessive coupling between networks

That core has since been extended with:

  • reward profiles (classic, ecological, austere)
  • explicit shelter geometry with entrance, interior, and deep zones
  • predator profiles for visual and olfactory hunters, including multi-predator worlds
  • a richer lizard state machine (PATROL, ORIENT, INVESTIGATE, CHASE, WAIT, RECOVER)
  • lizard working memory for investigate targets, ambush windows, chase streaks, and recovery
  • explicit spider memory for food, predator, safe shelter, and escape route
  • optional recurrent memory inside selected proposer modules
  • map templates and deterministic scenarios for behavioral evaluation
  • behavioral scorecards, ablations, learning-evidence workflows, reward audits, and offline analysis

The system still uses small NumPy neural networks, online learning, and independent modules, but the current architecture is closer to a simplified organism under ecological constraints.

What Changed

1. Explicit Predator: The Lizard

Probabilistic nighttime danger was replaced by a concrete predator in the environment.

The lizard:

  • occupies a real grid cell
  • has limited vision
  • moves more slowly than the spider
  • cannot enter the shelter
  • causes pain and health loss on contact
  • can be escaped if the spider keeps moving effectively

2. Primitive Motor Output

The motor system no longer chooses semantic actions such as MOVE_TO_FOOD or MOVE_TO_SHELTER.

The action space is now:

  • MOVE_UP
  • MOVE_DOWN
  • MOVE_LEFT
  • MOVE_RIGHT
  • STAY
  • ORIENT_UP
  • ORIENT_DOWN
  • ORIENT_LEFT
  • ORIENT_RIGHT

ORIENT_* actions are active-sensing turns: they rotate the heading and refresh perception for the current tick, but they do not translate the spider.

That keeps the simulator closer to primitive locomotion and active-sensing control instead of scripted behavior verbs.

3. Eating And Sleeping As Situated Behaviors

Because the motor output is now limited to primitive movement plus active-sensing orientation, feeding and rest emerge from spatial context:

  • when the spider reaches food, feeding happens automatically
  • when the spider reaches shelter under fatigue or night pressure, recovery happens automatically
  • staying still can still help feeding or rest, but it is no longer a rigid prerequisite

Modular Architecture

The brain now contains five proposer modules plus action_center and motor_cortex:

  1. visual_cortex
  2. sensory_cortex
  3. hunger_center
  4. sleep_center
  5. alert_center
  6. action_center
  7. motor_cortex

The first five networks propose locomotion in the same output space. action_center applies valence-based priority gating before final arbitration, and motor_cortex acts as the final locomotor executor or corrector.

The current architecture also includes:

  • fixed, named interfaces per module
  • module dropout during training
  • local reflexes per module, defined from the module's own interface
  • auxiliary per-module targets ("reflex targets") to reduce coadaptation
  • optional recurrent proposer state for selected modules through BrainAblationConfig.recurrent_modules
  • explicit contribution metrics in action_center for dominance, agreement, and effective proposer count
  • the local_credit_only ablation, which preserves current inference but removes global policy-gradient broadcast during modular training

Recurrent memory is opt-in per proposer. Set BrainAblationConfig(recurrent_modules=(...)) to make selected modules stateful within an episode, or use the canonical modular_recurrent and modular_recurrent_all ablation variants when comparing recurrent and feed-forward architectures. Hidden state resets at episode boundaries, so recurrence helps with within-episode temporal context rather than cross-episode carryover. Evaluation guidance and variant definitions live in docs/ablation_workflow.md.

Layer responsibilities:

  • spider_cortex_sim/world.py: ecological dynamics, body state, and management of explicit observable memory
  • spider_cortex_sim/interfaces.py: named contracts between world and brain
  • spider_cortex_sim/modules.py: proposer networks only
  • spider_cortex_sim/agent.py: local reflexes, auxiliary targets, valence gating, and final motor correction

The generated contract documentation lives in docs/interfaces.md. The short topology note for the newer arbitration chain lives in docs/action_center_design.md.

Architectural Ladder Protocol

For systematic validation of modularity, see docs/architectural_ladder.md.

The ladder complements the ablation workflow: use docs/architectural_ladder.md as the diagnostic framework for deciding where a regression likely comes from, and use docs/ablation_workflow.md for the detailed variant definitions, competence-gap and shaping-gap reads, and benchmark-of-record comparison procedures.

The parallel B-series diagnostic strategy is documented in docs/b_series_strategy.md. It separates B0 legacy reproduction of the old semantic-action benchmark from B0 current bridge, which keeps the current primitive action space while restoring an internal semantic decision layer. B0 current bridge now uses a deliberately simple legacy-direct controller: the neural head still emits semantic logits for audit, but a trace-visible semantic controller selects the legacy intention before primitive execution. B1 and later B levels use transfer learning from the previous B checkpoint by default.

That policy is intentionally blocking at A5: no finer biological module is an automatic improvement, and no new center should be added until A2 learns, A3 preserves it, A4 does not collapse relative to the coarse control, local interface tasks pass, credit remains interpretable, and the candidate comes with an explicit hypothesis plus a selective ablation test.

A5 Admission Policy

Readers considering any new biological center should start with docs/architectural_ladder.md, not with interface expansion alone. That document is the source of truth for the prerequisite checklist, module admission criteria, and reusable proposal checklist, and it explicitly treats every proposed module as a hypothesis to test rather than an automatic improvement.

Environment

The 2D grid world contains:

  • shelter or burrow (H)
  • shelter spatial roles (entrance, inside, deep)
  • walls or blocked cells (#)
  • clutter terrain (:)
  • food (F)
  • spider (A)
  • predator lizard or hunter instances (L)
  • day/night cycle
  • limited vision
  • food and predator smell fields
  • homeostatic pressures from hunger, fatigue, health, pain, and contact

If the spider and lizard occupy the same ASCII-rendered cell, the renderer shows X.

Predator Profiles

Predators are configured with PredatorProfile values in spider_cortex_sim/predator.py. A profile defines the predator name, visual range, smell range, detection style, move interval, and detection threshold.

Built-in profiles:

  • DEFAULT_LIZARD_PROFILE: backward-compatible single-predator behavior matching the classic lizard parameters
  • VISUAL_HUNTER_PROFILE: long visual range, short smell range, detection_style="visual"
  • OLFACTORY_HUNTER_PROFILE: short visual range, long smell range, detection_style="olfactory"

SpiderWorld.reset() accepts predator_profiles=[...]. Passing one profile preserves the old single-predator API; passing several profiles spawns one predator per profile and creates one controller per predator. The compatibility helpers still work:

  • world.lizard and world.lizard_pos() refer to the first predator
  • world.predators, world.predator_positions(), world.predator_count, and world.get_predator(index) expose the full predator set

Scenario setup code can also assign explicit LizardState(profile=...) instances when a benchmark needs hand-placed predators.

Project Layout

.
├── README.md
├── docs/
│   ├── ablation_workflow.md
│   ├── action_center_design.md
│   ├── architectural_ladder.md
│   └── interfaces.md
├── spider_cortex_sim/
│   ├── __main__.py
│   ├── agent.py
│   ├── bus.py
│   ├── cli.py
│   ├── gui.py
│   ├── interfaces.py
│   ├── modules.py
│   ├── nn.py
│   ├── predator.py      # PredatorProfile, DEFAULT_LIZARD_PROFILE, VISUAL_HUNTER_PROFILE, OLFACTORY_HUNTER_PROFILE
│   ├── simulation.py
│   └── world.py
└── tests/

Installation

Create and activate a virtual environment:

export PYTHON_BIN="${PYTHON_BIN:-python3.10}"
$PYTHON_BIN --version
$PYTHON_BIN -m venv venv
source venv/bin/activate

On Windows, use the Python launcher:

py -3.10 --version
py -3.10 -m venv venv
venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Notes:

  • Python 3.10+ is required for the simulator and CLI entrypoints
  • If your supported interpreter is newer, substitute its versioned command such as python3.11
  • numpy is required
  • pygame-ce is optional and only needed for the graphical interface (--gui)

Quick Start

Budget profiles are the recommended way to run reproducible experiments. Start with the smoke profile for a first local run:

$PYTHON_BIN -m spider_cortex_sim --budget-profile smoke

Save a summary and trace:

$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile smoke \
  --summary spider_summary.json \
  --trace spider_trace.jsonl

Render the final evaluation episode in ASCII:

$PYTHON_BIN -m spider_cortex_sim --budget-profile smoke --render-eval

Move to the dev budget when you want a slightly stronger reproducible local benchmark:

$PYTHON_BIN -m spider_cortex_sim --budget-profile dev --summary spider_summary.json

Use explicit episode and step flags only when you want a custom non-profiled run:

$PYTHON_BIN -m spider_cortex_sim \
  --episodes 120 \
  --eval-episodes 3 \
  --max-steps 90

Reward Profiles And Maps

Train with the ecological profile and an alternate map:

$PYTHON_BIN -m spider_cortex_sim \
  --episodes 120 \
  --eval-episodes 3 \
  --max-steps 90 \
  --reward-profile ecological \
  --map-template side_burrow

Available map templates:

  • central_burrow
  • side_burrow
  • corridor_escape
  • two_shelters
  • exposed_feeding_ground
  • entrance_funnel

Map notes:

  • corridor_escape uses NARROW terrain, representing a constrained passage
  • two_shelters introduces two deep shelters competing with a central transition zone
  • exposed_feeding_ground concentrates food in open terrain with more exposed approach paths
  • entrance_funnel compresses the shelter entrance through a bottleneck that favors blocking and ambush

Reward profiles:

  • classic: more guided, best for quick training and baseline runs
  • ecological: less direct shaping and more pressure from world dynamics
  • austere: minimal-progress baseline for shaping audits and contrast

Shaping Reduction Program

The shaping-reduction program treats dense rewards as scaffolding unless they survive an explicit audit. The working philosophy is simple: trust a behavior more when it still appears under austere, where direct progress rewards, shelter-entry bonuses, predator-escape bonuses, and day-exploration guidance are removed or reduced. classic and ecological remain useful training and comparison profiles, but claims should increasingly rest on behavior that survives the austere profile.

The reduction roadmap in spider_cortex_sim/reward/shaping.py tracks the next reward terms to defend, weaken, or investigate. The SHAPING_REDUCTION_ROADMAP definition lives in that shaping.py submodule inside the spider_cortex_sim/reward/ package:

  • resting: high priority, kept weakened while rest outcome evidence is separated from configurable rest bonuses.
  • sleep_debt_pressure: high priority, defended only while it behaves like physiological pressure rather than indirect shelter guidance.
  • night_exposure, hunger_pressure, and fatigue_pressure: medium priority, currently defended as ecological or physiological costs, with bounded gap requirements.
  • homeostasis_penalty and terrain_cost: medium priority, under investigation because they mix legitimate ecological pressure with possible hidden steering.
  • action_cost: low priority, defended as a small universal energy cost.

Gap policy defines when dense profiles are too far ahead of austere. The hard limits are:

  • classic_minus_austere scenario success delta <= 0.20
  • ecological_minus_austere scenario success delta <= 0.15
  • classic_minus_austere mean reward gap <= 0.50
  • ecological_minus_austere mean reward gap <= 0.40
  • austere survival rate >= 0.50

Warnings fire before gates. For upper-bound gaps, the warning threshold is 0.80 * hard_limit. For austere survival, the warning band starts below 0.50 / 0.80 = 0.625 and becomes a gate violation below 0.50.

Evidence criteria classify reward terms as follows:

  • removed or removable: austere survival reaches the threshold on all relevant scenarios and claim tests do not regress.
  • weakened: austere survival reaches the threshold on a majority of relevant scenarios, while remaining classic/ecological gaps stay bounded.
  • defended: the term has documented physiological or ecological necessity, not just behavioral convenience.
  • under_investigation: more scenario evidence is needed before the term can be defended, weakened, or removed.
  • outcome_signal: sparse outcome anchors such as feeding, predator contact, and death are tracked separately from dense progress shaping.

Scenario requirements decide how austere survival affects reports:

  • Gate scenarios: night_rest, predator_edge, entrance_ambush, shelter_blockade, two_shelter_tradeoff, visual_olfactory_pincer, olfactory_ambush, and visual_hunter_open_field.
  • Warning scenarios: recover_after_failed_chase, food_vs_predator_conflict, and sleep_vs_exploration_conflict.
  • Diagnostic scenarios: open_field_foraging, corridor_gauntlet, exposed_day_foraging, and food_deprivation.

Primary claim tests now depend on austere survival. If a primary claim has austere_survival_required=True, every gate scenario in that claim must pass the austere survival threshold before the claim can pass. This applies to learning_without_privileged_signals, escape_without_reflex_support, and specialization_emerges_with_multiple_predators.

When a benchmark summary includes austere comparison data, read these fields first:

{
  "austere_survival_summary": {
    "overall_survival_rate": 0.75,
    "gate_pass_count": 6,
    "gate_fail_count": 2,
    "warning_scenarios": [],
    "gap_policy_violations": []
  },
  "austere_survival_gate_passed": false,
  "shaping_dependent_behaviors": [
    {
      "scenario": "night_rest",
      "profile": "classic",
      "success_rate_delta": 0.35,
      "limit": 0.20
    }
  ]
}

In this example, the overall austere rate is informative but not sufficient: two gate scenarios failed, so primary claims that rely on those gates cannot be treated as valid. The shaping_dependent_behaviors entry says classic outperformed austere on night_rest beyond the policy limit, so the roadmap should treat the relevant rest or sleep terms as reduction risks.

Deterministic Scenarios

Run a single scenario:

$PYTHON_BIN -m spider_cortex_sim \
  --episodes 0 \
  --eval-episodes 0 \
  --scenario night_rest

Run the full scenario suite:

$PYTHON_BIN -m spider_cortex_sim \
  --episodes 0 \
  --eval-episodes 0 \
  --scenario-suite

Available scenarios:

  • night_rest
  • predator_edge
  • entrance_ambush
  • open_field_foraging
  • shelter_blockade
  • recover_after_failed_chase
  • corridor_gauntlet
  • two_shelter_tradeoff
  • exposed_day_foraging
  • food_deprivation
  • visual_olfactory_pincer
  • olfactory_ambush
  • visual_hunter_open_field
  • food_vs_predator_conflict
  • sleep_vs_exploration_conflict

Scenario-to-map specializations include:

  • entrance_ambush, shelter_blockade, and recover_after_failed_chase use entrance_funnel
  • open_field_foraging and exposed_day_foraging use exposed_feeding_ground
  • visual_olfactory_pincer and visual_hunter_open_field use exposed_feeding_ground
  • olfactory_ambush uses entrance_funnel
  • corridor_gauntlet uses corridor_escape
  • two_shelter_tradeoff uses two_shelters

Multi-predator scenario intent:

  • visual_olfactory_pincer: the spider starts between a visible visual hunter in front and an olfactory hunter behind and downwind; it tests dual-threat perception and module specialization
  • olfactory_ambush: an olfactory hunter waits near a shelter entrance where the spider can smell danger without seeing it; it targets sensory-cortex-led response
  • visual_hunter_open_field: a fast visual hunter pressures the spider in open terrain; it targets visual-cortex-led response under exposed conditions

Behavioral Evaluation

Run the full behavioral suite with explicit scorecards:

$PYTHON_BIN -m spider_cortex_sim \
  --episodes 0 \
  --eval-episodes 0 \
  --behavior-suite \
  --full-summary

Run one behavioral scenario and export flat CSV:

$PYTHON_BIN -m spider_cortex_sim \
  --episodes 0 \
  --eval-episodes 0 \
  --behavior-scenario night_rest \
  --behavior-csv spider_behavior.csv \
  --full-summary

summary["behavior_evaluation"] includes:

  • suite: aggregated scenario scorecards with success_rate, checks, behavior_metrics, diagnostics, and failures
  • summary: overall suite success and detected regressions
  • comparisons: optional profile/map/seed comparison matrices when behavioral comparison flags are used
  • learning_evidence: optional comparison between a trained checkpoint and controls such as random_init, reflex_only, freeze_half_budget, and trained_long_budget
  • claim_tests: optional experiment-of-record synthesis that composes the canonical learning-evidence, ablation, and noise-robustness primitives into per-claim pass/fail results

Weak-signal scenarios now publish scenario-owned interpretation metadata:

  • diagnostic_focus
  • success_interpretation
  • failure_interpretation
  • budget_note

The scenario diagnostics block also summarizes:

  • primary_outcome
  • outcome_distribution
  • partial_progress_rate
  • died_without_contact_rate

Capability Probes

Behavioral scenarios are split between emergence gates and capability probes. Emergence gates support claim tests: they ask whether learned behavior remains present without privileged support, improves with memory, survives noise, or specializes by predator type. Capability probes map narrower behavioral boundaries inside the full benchmark. They can score 0.00 while still producing interpretable evidence through failure_mode, progress_band, and outcome_band.

The explicit capability probes are:

Scenario Target skill Acceptable partial progress
open_field_foraging food_vector_acquisition_exposed Any positive food_distance_delta or left_shelter with food signal present indicates food-vector acquisition capability even if foraging_viable fails.
corridor_gauntlet corridor_navigation_under_threat Any positive food_distance_delta without contact indicates corridor navigation capability; survived_no_progress indicates shelter-exit failure distinct from navigation failure.
exposed_day_foraging daytime_foraging_under_patrol Any positive food_distance_delta indicates foraging capability even under threat; cautious_inert indicates arbitration chose safety over food.
food_deprivation hunger_driven_commitment commits_to_foraging=True with approaches_food=True indicates commitment capability even if timing_failure prevents full success.

These four scenarios remain in the full behavioral benchmark with benchmark_tier="capability" and is_capability_probe=True. They are excluded from claim tests because their role is to expose capability boundaries and calibration outcomes, not to serve as pass/fail evidence for the repository's emergence hypotheses.

Emergence Hypothesis

The scientific question in this repository is narrower than "does the score go up?" The emergence hypothesis is that the modular cortex learns reusable threat-sensitive behavior that remains present when privileged supports are removed, stays coherent under disturbance, and differentiates between predator types rather than collapsing into one generic escape reflex.

The supporting workflows still matter, but they are no longer the experiment-of-record by themselves:

  • ablations isolate which modules and memory pathways matter
  • learning-evidence comparisons separate trained behavior from initialization or reflex-only baselines
  • the noise matrix checks whether behavior survives train/eval mismatch

Those workflows provide the raw evidence. The claim test suite is the formal gate that reads them together and decides whether the core scientific claims actually hold.

Claim Test Suite

Run the canonical claim suite and write the full experiment record to JSON:

$PYTHON_BIN -m spider_cortex_sim \
  --claim-test-suite \
  --summary results.json

The five canonical claim tests are:

  • learning_without_privileged_signals Hypothesis: trained behavior still beats an untrained policy after privileged reflex support is removed. Protocol: learning-evidence comparison from random_init to trained_without_reflex_support across night_rest, predator_edge, entrance_ambush, shelter_blockade, and two_shelter_tradeoff, with the leakage audit enforced. Success criterion: trained_without_reflex_support must improve scenario_success_rate over random_init by at least 0.15 and the leakage audit must report zero unresolved privileged-signal findings.

  • escape_without_reflex_support Hypothesis: predator escape remains learned behavior rather than a reflex-only artifact. Protocol: learning-evidence comparison from reflex_only to trained_without_reflex_support over predator_edge, entrance_ambush, and shelter_blockade. Success criterion: trained_without_reflex_support must reach predator-response scenario_success_rate >= 0.60 and exceed reflex_only by at least 0.10.

  • memory_improves_shelter_return Hypothesis: recurrent memory improves delayed shelter return and shelter trade-off behavior. Protocol: ablation comparison of modular_recurrent versus modular_full on night_rest and two_shelter_tradeoff. Success criterion: modular_recurrent must improve shelter-return scenario_success_rate by at least 0.10.

  • noise_preserves_threat_valence Hypothesis: threat-sensitive arbitration survives train/eval noise mismatch instead of working only on the diagonal. Protocol: canonical noise-robustness matrix, comparing diagonal and off-diagonal aggregate scores over the threat-response scenarios. Success criterion: the off-diagonal threat-response score must stay at least 0.60, and the diagonal-minus-off-diagonal gap must stay at most 0.15.

  • specialization_emerges_with_multiple_predators Hypothesis: multiple predator ecologies produce predator-type specialization instead of one undifferentiated threat pathway. Protocol: predator-type ablation comparison across visual_olfactory_pincer, olfactory_ambush, and visual_hunter_open_field, paired with type-specific cortex engagement checks in the full modular policy. Success criterion: drop_visual_cortex must drive visual_minus_olfactory_success_rate <= -0.10, drop_sensory_cortex must drive visual_minus_olfactory_success_rate >= 0.10, and the reference policy must show the expected cortex engagement in at least 2 of the 3 specialization scenarios.

Generic benchmarks such as the ablation suite and the noise matrix still provide the supporting detail, but the claim tests are the scientific nucleus: they are the concise pass/fail record for whether the main emergence story survives contact with the actual measurements.

Graphical Interface (Pygame)

$PYTHON_BIN -m spider_cortex_sim \
  --gui \
  --episodes 120 \
  --eval-episodes 3 \
  --max-steps 90 \
  --reward-profile classic \
  --map-template central_burrow

The GUI window is resizable. When you resize it, the grid automatically zooms to fit the available space. The model/architecture controls stay docked in the left sidebar, and the ecological diagnostics stay docked in the right-side panel.

To adjust scaling behavior, see the GUI controller logic (spider_cortex_sim/gui/controller.py), including the min/max cell size clamps and the ui_scale used for font sizing.

The GUI displays:

  • a left model selector for modular_full, A0, B0 current bridge, and B0 legacy
  • visible architecture metadata including runtime, action space, B level, B mode, seed, map, and reward profile
  • B-series diagnostics such as learned semantic action, selected semantic action, B0-current source/reason, bridge primitive action, bridge reason, and external override count
  • an Evolution save action that writes audit snapshots under artifacts/gui_evolution_snapshots/
  • the predator lizard on the grid
  • shelter roles and terrain types
  • contact, sighting, and escape counters
  • lizard position, mode, and current target
  • explicit spider memory and sleep debt
  • recent reward components
  • nighttime shelter-role distribution and predator-state occupancy

B0 legacy semantic is shown as an isolated benchmark mode. It does not use the current SpiderWorld, does not add semantic actions to the current public action space, and uses a simplified legacy panel/grid so it is not confused with current-world ecological success. B0 current bridge runs in the current world as a simple semantic bridge and still submits only primitive actions to world.step().

Useful shortcuts:

  • V: toggle visibility overlay
  • M: toggle smell heatmap

Saving And Loading The Brain

Train and save:

$PYTHON_BIN -m spider_cortex_sim \
  --episodes 120 --eval-episodes 3 --max-steps 90 \
  --save-brain spider_brain

Load and continue training:

$PYTHON_BIN -m spider_cortex_sim \
  --episodes 60 --eval-episodes 3 --max-steps 90 \
  --load-brain spider_brain \
  --save-brain spider_brain

Load only selected modules:

$PYTHON_BIN -m spider_cortex_sim \
  --episodes 60 --eval-episodes 3 --max-steps 90 \
  --load-brain spider_brain \
  --load-modules visual_cortex hunger_center

Compatibility notes:

  • the current architecture uses an explicit interface signature
  • older saves predating the current interface standardization are rejected with explicit incompatibility errors
  • older checkpoints predating sleep_phase, rest_streak, sleep_debt, shelter-role signals, certainty/occlusion signals, explicit memory, or oriented perception are also incompatible with the current architecture
  • the versioned interface registry and generated contract docs are in docs/interfaces.md

Because interface descriptions are part of the current fingerprinted metadata, this English translation pass also changes interface and architecture fingerprints. Older checkpoints may therefore fail compatibility checks even though behavior and identifiers were not intentionally refactored.

Tests

Run the full suite:

$PYTHON_BIN -m unittest discover -s tests -v

The tests cover:

  • standardized interface shapes and generated interface docs
  • contextual feeding and rest
  • auditable reward decomposition
  • sleep progression SETTLING -> RESTING -> DEEP_SLEEP, sleep debt, and interruptions
  • predator contact with real damage
  • shelter geometry and occlusion
  • explicit spider memory
  • map templates and reachability
  • deterministic scenario regressions
  • memory-guided escape and foraging after loss of sight
  • scenario runners and deterministic predator-response latency
  • online parameter updates
  • lightweight trainability checks across reward profiles, comparisons, and alternate maps

Metrics And Tracing

The summary and trace include:

  • per-step reward_components
  • reward_audit, including component inventory, shaping categories, and leakage candidates
  • nighttime shelter occupancy and nighttime stillness
  • nighttime shelter-role distribution (outside, entrance, inside, deep)
  • predator-response latency
  • predator contacts, escapes, and response latency by predator type
  • dominant module response by predator type
  • reward_profile and map_template
  • config.operational_profile, including active thresholds and operational weights
  • config.budget, including resolved profile, benchmark strength, seeds, and explicit overrides
  • checkpointing when --checkpoint-selection best is used
  • explicit certainty and occlusion fields per visual channel
  • world-layer maintained heading and decayed percept traces in trace metadata
  • explicit predator_motion_salience
  • normalized memory vectors in trace metadata
  • predator occupancy by state (PATROL, ORIENT, INVESTIGATE, CHASE, WAIT, RECOVER)
  • predator state transitions and dominant predator state per episode or scenario
  • food and shelter distance deltas
  • event_log stages per tick
  • behavioral scorecards per scenario in behavior_evaluation
  • diagnostic per-episode bands such as progress_band and outcome_band
  • explicit action_center arbitration outputs such as winning_valence, valence_scores, module_gates, suppressed_modules, and evidence

When --debug-trace is combined with --trace, each tick also includes:

  • serialized observations before and after transition
  • reward components
  • normalized memory vectors
  • per-module logits before reflex, reflex delta, and post-reflex logits
  • logits after valence gating, per-module gate_weight, and debug.arbitration
  • effective_reflex_scale, module_reflex_override, module_reflex_dominance, and final_reflex_override
  • full predator internal state

Use --full-summary to print the complete JSON summary to stdout.

Specialization metrics compare how modules respond when visual versus olfactory predators are the primary threat. A high predator-type specialization score means response distributions differ by predator type, such as stronger visual_cortex dominance for visual hunters and stronger sensory_cortex dominance for olfactory hunters. A low score means the same modules respond similarly to both predator types, which can be useful as a baseline but does not show sensory-niche specialization.

Budget Profiles

The CLI exposes explicit budget profiles:

  • smoke: quick sanity or CI profile (6 episodes, 1 evaluation run, 60 steps, seed 7)
  • dev: short reproducible local benchmark (12 episodes, 2 evaluation runs, 90 steps, seeds 7/17/29)
  • report: stronger reporting workflow (24 episodes, 4 evaluation runs, 120 steps, 2 repetitions per scenario, seeds 7/17/29/41/53)
  • paper: publication-grade benchmark-of-record workflow; requires --checkpoint-selection best and records the resolved seed and checkpoint budget in the summary

Canonical commands:

# smoke: sanity / CI
$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile smoke \
  --behavior-suite --full-summary

# dev: fast local benchmark
$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile dev \
  --ablation-suite --full-summary

# report: stronger benchmark + automatic checkpoint selection
$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile report \
  --checkpoint-selection best \
  --ablation-suite --full-summary

Without --budget-profile, the run still works in custom mode and records the effective values and overrides in summary["config"]["budget"].

Benchmark Of Record

Use the paper budget with best-checkpoint selection for publication-facing architecture claims. Add --benchmark-package to write a reproducible package containing the manifest, resolved configuration, seed-level rows, uncertainty-aware aggregate tables, claim-test tables, effect-size tables, reports, plots, supporting CSVs, and limitations.

$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile paper \
  --checkpoint-selection best \
  --ablation-suite \
  --summary spider_architecture_paper_summary.json \
  --behavior-csv spider_architecture_paper_rows.csv \
  --benchmark-package spider_architecture_paper_package \
  --full-summary

The package manifest records file hashes, seed count, confidence level, resolved budget metadata, checkpoint-selection metadata, and an environment block with git commit/tag/dirty-state plus Python version and platform. pip_freeze.txt preserves the dependency snapshot as a hashed package artifact. The CLI rejects --benchmark-package unless both --budget-profile paper and --checkpoint-selection best are present.

Uncertainty reporting is seed-level. Confidence intervals are percentile bootstrap intervals over seed-level metric values and default to 95%. Claim-test pass/fail logic remains based on point estimates; the package adds reference_uncertainty, comparison_uncertainty, delta_uncertainty, and effect_size_uncertainty for reporting. Effect-size tables report Cohen's d with negligible, small, medium, and large magnitude labels.

Comparison Workflows

Compare reward profiles on the current map:

$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile dev \
  --compare-profiles --full-summary

Compare maps under the current reward profile:

$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile dev \
  --reward-profile ecological \
  --compare-maps --full-summary

Compare the behavioral suite across profiles:

$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile dev \
  --behavior-compare-profiles --full-summary

Compare the behavioral suite across maps and export CSV:

$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile dev \
  --reward-profile ecological \
  --behavior-compare-maps \
  --behavior-csv spider_behavior_compare.csv \
  --full-summary

The shaping audit uses austere as the minimal baseline and records deltas against it under summary["behavior_evaluation"]["shaping_audit"].

Offline Analysis

The project includes a separate runner that transforms summary.json, trace.jsonl, and behavior_csv into an offline analysis bundle:

$PYTHON_BIN -m spider_cortex_sim.offline_analysis \
  --summary spider_summary_compare.json \
  --trace spider_trace_debug.jsonl \
  --behavior-csv spider_behavior_compare.csv \
  --output-dir offline_analysis

Rules:

  • --output-dir is required
  • at least one of --summary, --trace, or --behavior-csv is required
  • the report is always emitted, even with partial input
  • missing blocks are reported in report.md and report.json instead of aborting execution

The offline analysis output directory also includes a navigation index at INDEX.md.

  • Open INDEX.md first to jump to scenario summaries, gate results, claims, and evidence.
  • All links are relative, so you can move/copy the report folder and still browse it locally.

Ablations And Learning Evidence

Compare the modular reference against the canonical ablation suite:

$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile dev \
  --ablation-suite \
  --behavior-csv spider_ablation_rows.csv \
  --full-summary

Run the learning_evidence suite under the smoke budget:

$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile smoke \
  --learning-evidence \
  --behavior-scenario night_rest \
  --behavior-csv spider_learning_evidence_rows.csv \
  --full-summary

Run the canonical short-vs-long learning-evidence comparison:

$PYTHON_BIN -m spider_cortex_sim \
  --budget-profile smoke \
  --learning-evidence \
  --learning-evidence-long-budget-profile report \
  --behavior-suite \
  --summary spider_learning_evidence_summary.json \
  --behavior-csv spider_learning_evidence_rows.csv \
  --full-summary

The detailed ablation workflow, variant definitions, and canonical check-in table live in docs/ablation_workflow.md.

Modeling Notes

  • The system is biologically inspired, not biologically faithful
  • The "return vector to shelter" is a simplified form of proprioception or minimal spatial memory
  • Explicit memory is perception-grounded: data sources are limited to local visual perception, contact events, and movement history. The environment pipeline maintains mechanics such as aging and TTL expiration, but it cannot inject information the spider has not perceived.
  • Local per-module reflexes act like innate behavior that online learning later refines
  • There is no giant fallback center that integrates everything; each proposer receives only its own interface and emits only standardized locomotion proposals

Natural Extensions

  1. implemented: multiple predators with different sensory niches
  2. implemented: a more strongly oriented field of view with active sensing
  3. separate locomotion into gait, speed, and body orientation
  4. migrate the networks to PyTorch while preserving the same modular interface signature

Active sensing now uses a tightened 45 degree foveal cone and 70 degree peripheral cone. ORIENT_* actions refresh current-tick perception immediately after the heading change, and scan recency is tracked so observations can distinguish fresh inspection from stale or never-scanned headings.

About

Biologically Inspired Organism, Not a Game

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages