Releases · Parslee-ai/statebench

Memgine: A Deterministic Memory Engine for Stateful AI Agents

Implements the full state-based specification on top of StateBench v1.0.

Key Results (3-run mean ± std)

Configuration	Decision Accuracy
memgine / Opus 4.6	97.3% ± 0.5%
memgine / GPT-5.2	95.8% ± 0.4%
state_based_no_supersession / GPT-5.2	90.7% ± 0.3%
transcript_replay / GPT-5.2	81.2% ± 0.8%

What's New

Query-relevance sorting — most relevant facts appear last, exploiting LLM recency attention
Engine-level access control — restricted/scoped facts never reach the model (leak rate: 13% → 0%)
Adaptive inline repair — stale conclusions placed next to corrected parent facts
Compaction architecture — threshold-based with layer-specific rules, validated at 2.2× compression
Test split validation — 92.6% (GPT-5.2) and 96.0% (Opus 4.6) on held-out data

Paper

See docs/memgine-deterministic-memory-engine.pdf for the full paper.

Key Finding

Architectural enforcement beats prompt engineering. When restricted facts are filtered by the engine rather than guarded by system prompt instructions, information leakage drops from 13% to 0%.

Baseline	Decision Accuracy	SFRR
state_based	80.3%	34.4%
rolling_summary	72.1%	21.3%
fact_extraction	63.9%	27.9%
transcript_replay	60.7%	24.6%
no_memory	26.2%	19.7%

Baseline

Decision Accuracy

SFRR

state_based

80.3%

34.4%

rolling_summary

72.1%

21.3%

fact_extraction

63.9%

27.9%

transcript_replay

60.7%

24.6%

no_memory

26.2%

19.7%

pip install statebench # Generate benchmark statebench generate --tracks supersession commitment_durability --count 100 # Evaluate baseline statebench evaluate -d data.jsonl -b state_based -m gpt-4o -p openai # Compare all baselines statebench compare -d data.jsonl -m gpt-4o -l 50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Memgine: A Deterministic Memory Engine for Stateful AI Agents

Key Results (3-run mean ± std)

What's New

Paper

Key Finding

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

StateBench v1.0

Dataset

Baseline Results (gpt-5.2, 50 timelines)

Quick Start

Tracks

Uh oh!

Releases: Parslee-ai/statebench

Memgine v1.1.0 - Deterministic Memory Engine

Memgine: A Deterministic Memory Engine for Stateful AI Agents

Key Results (3-run mean ± std)

What's New

Paper

Key Finding

Uh oh!

StateBench v1.0

StateBench v1.0

Dataset

Baseline Results (gpt-5.2, 50 timelines)

Quick Start

Tracks

Uh oh!