Skip to content

[Hackathon] openai-llm: semantic memory plugin (recall + TTL + LRU)#4

Open
mariagorskikh wants to merge 2 commits into
mainfrom
hackathon/openai-llm-semantic-memory
Open

[Hackathon] openai-llm: semantic memory plugin (recall + TTL + LRU)#4
mariagorskikh wants to merge 2 commits into
mainfrom
hackathon/openai-llm-semantic-memory

Conversation

@mariagorskikh

@mariagorskikh mariagorskikh commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Which piece + why

Layer 10: Memory. The default blackboard plugin is a shared dict — perfect for state-machine agents that already know the key they want, but the wrong shape for the thing LLM agents actually do: "recall the most relevant past interaction given this prompt." The Memory layer had exactly one reference plugin; this PR adds a second that is genuinely useful for the retrieval-augmented swarm coordination scenarios NEST is meant to stress-test.

Core idea

A new memory:semantic plugin that satisfies the full Memory protocol (read / write / subscribe / cas) so it is a drop-in replacement for blackboard, but layers three LLM-agent-relevant capabilities on top:

  1. Similarity recall. recall(query, k, min_score) returns the top-k most similar stored values by cosine similarity over a deterministic hashed bag-of-(tokens + character trigrams) embedder. No external service, no API key, no GPU — and crucially byte-identical across runs, so NEST's "same seed → identical trace" guarantee survives. Character trigrams give morphological signal so recall("apple buyer", k=1) actually finds a memory that says "I want to buy apples".
  2. Capacity + LRU eviction. Recalled entries refresh their LRU position so "useful" memories outlive dead weight. This is the axis you actually want to benchmark: what happens when 50 agents share a memory of size 100?
  3. TTL with a logical clock. Memories can age out. Pass now_fn to drive the clock from the simulator, or let the plugin tick its own counter internally. Overwriting a key resets its TTL; recall does not (popular-but-stale entries still expire).

Plus forget(key) and stats() for observability.

Registered under the built-in plugin table as memory:semantic, so scenarios opt in by changing one YAML line. The default stays blackboard — no behavior change for anyone who doesn't ask for it.

How to test

# Unit tests (20 new, plus the original 38 in this file)
pytest packages/nest-plugins-reference/tests/test_plugins.py::TestSemanticMemory -v

# Full repo
pytest packages/                       # 279 passed

# End-to-end in a real scenario
cp scenarios/marketplace.yaml /tmp/m.yaml
sed -i 's/memory: blackboard/memory: semantic/' /tmp/m.yaml
nest run /tmp/m.yaml -o /tmp/trace.jsonl
python -c "
from pathlib import Path
from nest_core.validators import validate_trace
for r in validate_trace(Path('/tmp/trace.jsonl'), 'marketplace'):
    print(('PASS' if r.passed else 'FAIL'), r.name)
"
# All three marketplace validators PASS, and re-running gives a
# byte-identical trace (determinism preserved).

Highlights of the test suite:

  • Protocol conformance: isinstance(mem, Memory) and registry resolution.
  • Determinism: two independent SemanticMemory() instances, same writes → identical recall results (key, value, score).
  • LRU: recall on key a protects it from eviction when capacity overflows.
  • TTL: external clock, overwrite-resets-TTL, recall skips expired entries.
  • Binary payloads: non-UTF8 bytes round-trip cleanly (indexed by hex digest).

Key assumptions

  • Determinism is non-negotiable. Python's built-in hash is process-salted, so the embedder uses FNV-1a 64-bit explicitly. Same input → same vector, on every machine, every Python version.
  • Hashed trigrams are a baseline, not a learned embedding. A real-LLM scenario probably wants memory:openai_embeddings as a separate plugin behind the same surface; that one would be Tier-2-only because its outputs aren't reproducible. Keeping that out of this PR keeps the contribution focused and the determinism guarantee intact.
  • TTL is anchored to write time, not access time. Hot-but-stale memories still expire — matches what production retrieval stores usually want.
  • No new deps. No numpy, no sklearn, no embeddings service. Pure stdlib, runs anywhere NEST runs.

Persona

OpenAI researcher building LLM agent orchestration; deeply interested in agent memory architectures and what gets remembered vs. evicted when many LLM agents talk to each other.

Future work

  • memory:openai_embeddings and memory:anthropic_embeddings plugins behind the same recall surface — Tier 2 only.
  • A retrieval-stress scenario (memory_swarm.yaml) where N agents share one bounded SemanticMemory and have to coordinate via recall under message drop + Byzantine fractions. The natural validator: did the swarm converge on the right memory, or did the relevant fact get evicted?
  • Validator that asserts properties on mem.stats() (e.g. eviction rate below threshold, no expiration storms).
  • Vector-store adapters (FAISS / pgvector / Chroma) registered as additional memory:* plugin names.

https://claude.ai/code/session_01C5j2D4MgCkPgsjSCqBVpWW


Generated by Claude Code

Summary by Sourcery

Add a new semantic memory plugin that extends the Memory layer with deterministic similarity-based recall, capacity limits, TTL expiration, and observability, and wire it into the built-in plugin registry and docs.

New Features:

  • Introduce a SemanticMemory plugin that implements the Memory protocol while providing similarity-based recall over stored values.
  • Expose additional semantic-memory operations including recall with scoring, explicit forget, and a stats endpoint for observability.

Enhancements:

  • Register the semantic memory plugin as a built-in memory:semantic option in the plugin registry so it can be selected from YAML scenarios.
  • Expand memory-layer documentation to describe the new semantic plugin, its API surface, and example usage, alongside the existing blackboard plugin.
  • Export SemanticMemory and its RecallHit type from the memory reference package for easier import by callers.

Documentation:

  • Document the new semantic memory plugin in the memory layer docs, including usage examples and guidance on when to use it.
  • Update the top-level README to note that the Memory layer now ships both the blackboard and semantic plugins.

Tests:

  • Add a comprehensive test suite for SemanticMemory covering protocol conformance, deterministic recall behavior, similarity ranking, LRU eviction, TTL expiry, stats reporting, and binary payload handling.

The Memory layer only shipped the `blackboard` plugin (a shared dict),
which is the wrong shape for LLM agents that need to recall the most
relevant past interaction given a prompt. This adds `memory:semantic`:
a drop-in `Memory` implementation that satisfies the existing
read/write/subscribe/cas protocol but additionally exposes:

- `recall(query, k, min_score)`: top-k similarity search over stored
  values, ranked by cosine similarity on a deterministic hashed
  bag-of-(tokens + char-trigrams) embedder. No external service, no
  API key, byte-identical results across runs — preserves NEST's
  Tier 1 determinism guarantee.
- `forget(key)` and `stats()` for observability.
- Optional `capacity` with LRU eviction; recalled entries refresh
  their LRU position so useful memories survive eviction.
- Optional `ttl` with logical-clock expiration; pass `now_fn` to
  share a clock with the simulator.

Registered as `memory:semantic` in the built-in plugin table so
scenarios can opt in by editing one YAML line. Verified end-to-end
with `nest run marketplace.yaml` — all three marketplace validators
pass and traces remain byte-identical across runs with the same seed.

Tested: 20 new unit tests covering protocol conformance, recall
ranking, determinism across instances, min-score filtering, LRU
eviction, TTL expiration (including overwrite-resets-TTL), recall
refreshing LRU position, binary payloads, input validation, and
plugin-registry resolution. Full suite: 279 passed.

https://claude.ai/code/session_01C5j2D4MgCkPgsjSCqBVpWW
@sourcery-ai

sourcery-ai Bot commented May 26, 2026

Copy link
Copy Markdown

Reviewer's Guide

Adds a new deterministic semantic memory plugin implementing the full Memory protocol, with similarity-based recall, TTL expiration, and LRU eviction, wires it into the plugin registry and docs, and provides a focused test suite to validate behavior and determinism.

File-Level Changes

Change Details Files
Introduce SemanticMemory plugin implementing Memory protocol with similarity recall, TTL, and LRU, built on a deterministic hashed embedding.
  • Implement SemanticMemory class with read/write/subscribe/cas matching the existing Memory protocol.
  • Add similarity-based recall(query, k, min_score) using deterministic FNV-1a-based hashed token + trigram embeddings and cosine similarity.
  • Implement capacity-bounded storage with LRU eviction and TTL-based expiration driven by a logical clock or injected now_fn.
  • Provide auxiliary methods forget(key) and stats() for explicit eviction and observability, and define RecallHit dataclass as the recall result type.
packages/nest-plugins-reference/nest_plugins_reference/memory/semantic.py
Register the new semantic memory plugin and expose it through public APIs and documentation.
  • Export SemanticMemory and RecallHit from the memory package init for external use.
  • Register ('memory', 'semantic') in the core PluginRegistry so scenarios can reference memory:semantic by name.
  • Update README and memory layer docs to describe the semantic plugin, its capabilities, and usage alongside blackboard.
packages/nest-plugins-reference/nest_plugins_reference/memory/__init__.py
packages/nest-core/nest_core/plugins.py
README.md
docs/layers/memory.md
Add tests to validate protocol conformance, semantic recall behavior, LRU/TTL semantics, stats, and determinism of SemanticMemory.
  • Verify SemanticMemory obeys the Memory protocol (read/write/subscribe/cas) and is registered as a builtin plugin.
  • Test similarity recall ranking, top-k ordering, min_score filtering, k=0 behavior, and handling of binary payloads.
  • Exercise capacity limits, LRU eviction, recall-driven recency updates, TTL expiration via external clock, overwrite-extends-TTL, and recall skipping expired entries.
  • Validate stats counters (size, capacity, writes, recalls, evictions, expirations), deterministic recall across independent instances, and argument validation for capacity/ttl.
packages/nest-plugins-reference/tests/test_plugins.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In SemanticMemory.subscribe, consider removing the key from _subscribers when its list becomes empty in the finally block to avoid unbounded growth of empty subscriber lists over long-running simulations with many distinct keys.
  • The _Entry dataclass includes fields like text and forgotten that are never read; consider removing or using them to reduce cognitive overhead and keep the internal model minimal.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `SemanticMemory.subscribe`, consider removing the key from `_subscribers` when its list becomes empty in the `finally` block to avoid unbounded growth of empty subscriber lists over long-running simulations with many distinct keys.
- The `_Entry` dataclass includes fields like `text` and `forgotten` that are never read; consider removing or using them to reduce cognitive overhead and keep the internal model minimal.

## Individual Comments

### Comment 1
<location path="packages/nest-plugins-reference/nest_plugins_reference/memory/semantic.py" line_range="274-278" />
<code_context>
+            while True:
+                yield await q.get()
+        finally:
+            self._subscribers[key].remove(q)
+
+    async def cas(self, key: str, expected: bytes, new: bytes) -> bool:
</code_context>
<issue_to_address>
**suggestion (performance):** Subscriber cleanup leaves empty lists in `_subscribers`, which can accumulate keys over time.

After removing `q`, empty lists remain in `self._subscribers`, so keys accumulate over time. Consider cleaning up empty entries:

```python
self._subscribers[key].remove(q)
if not self._subscribers[key]:
    del self._subscribers[key]
```

This avoids unbounded growth while preserving behavior for active subscriptions.

```suggestion
        try:
            while True:
                yield await q.get()
        finally:
            subscribers = self._subscribers.get(key)
            if subscribers is not None:
                subscribers.remove(q)
                if not subscribers:
                    del self._subscribers[key]
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +274 to +278
try:
while True:
yield await q.get()
finally:
self._subscribers[key].remove(q)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Subscriber cleanup leaves empty lists in _subscribers, which can accumulate keys over time.

After removing q, empty lists remain in self._subscribers, so keys accumulate over time. Consider cleaning up empty entries:

self._subscribers[key].remove(q)
if not self._subscribers[key]:
    del self._subscribers[key]

This avoids unbounded growth while preserving behavior for active subscriptions.

Suggested change
try:
while True:
yield await q.get()
finally:
self._subscribers[key].remove(q)
try:
while True:
yield await q.get()
finally:
subscribers = self._subscribers.get(key)
if subscribers is not None:
subscribers.remove(q)
if not subscribers:
del self._subscribers[key]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants