Skip to content

feat(memory-sync): migrate Notion/ClickUp/Linear/GitHub Composio providers off UnifiedMemory #2885

Description

@justinhsu1477

Summary

While auditing the #2705 silent-failure root cause for #2720, I noticed the same architectural pattern — provider writes to the legacy UnifiedMemory backend, retrieval reads from memory_tree — repeated across four native Composio providers. Slack and Gmail were already migrated; the remaining four were not.

Audit findings

`grep` over `src/openhuman/memory_sync/composio/providers/`:

Provider Ingest path Backend written
Slack `ingest_pipeline::ingest_chat` ✅ memory_tree (`mem_tree_chunks`)
Gmail `ingest_pipeline::ingest_email` ✅ memory_tree
Notion `persist_single_item` → `store_skill_sync` ❌ UnifiedMemory (`memory_docs`)
ClickUp `persist_single_item` → `store_skill_sync` ❌ UnifiedMemory
Linear `persist_single_item` → `store_skill_sync` ❌ UnifiedMemory
GitHub `persist_single_item` → `store_skill_sync` ❌ UnifiedMemory

Why this matters

Same symptom as #2705 but for Composio rather than vault:

  • Synced Notion / ClickUp / Linear / GitHub items don't appear in `mem_tree_chunks` or `mem_tree_ingested_sources`.
  • Every modern retrieval surface (`memory.search`, `tree.read_chunk`, `tree.browse`, the agent's recall path, summary trees, `tree.top_entities`) reads from memory_tree — so this data is invisible to all of them.
  • Claude Desktop / Cursor / any MCP client doing `memory.search` against the user's OpenHuman gets zero Notion / ClickUp / Linear / GitHub hits even when the user has those toolkits connected.
  • The legacy `UnifiedMemory::list_documents` / `query_namespace` API still surfaces the data for callers that use it directly, so this is a partial desync — less catastrophic than v0.54.0 — Vault sync UI shows "synced" but mem_tree_chunks and mem_tree_ingested_sources stay at 0 #2705's full-empty case, but still wrong.

Fix shape

Per provider:

  1. Add a new `providers//ingest.rs` module that mirrors the Slack canonical pattern:
    • Single `ingest_into_memory_tree(config, owner, connection_id, item)` async function
    • Routes through `memory::ingest_pipeline::ingest_document` with a stable `source_id = "{toolkit}:{connection_id}:{item_id}"`
  2. Update the provider's `sync()` to call the new `ingest_*` instead of `persist_single_item`.
  3. `source_id` is the dedup key for memory_tree's append-only invariant; re-sync of unchanged items short-circuits via the pipeline's `already_ingested` gate, mirroring how Slack handles re-paged messages.
  4. For each provider, add a regression test that pins the invariant: "a single sync of N items ⇒ N rows in `mem_tree_ingested_sources` and ≥N rows in `mem_tree_chunks`."

PR sequence

Each PR is independent and can be reviewed / merged separately. I'll handle all four.

Out of scope

  • Removing `persist_single_item` / `store_skill_sync` entirely — those are senamakel's refactor(memory): separate tree policy from generic engine + E2E tests #2585 follow-up ("Consider removing `UnifiedMemory` legacy type"). This issue is scoped to migrating the read side; the legacy helpers stay for any non-Composio caller until the broader removal.
  • Reconciling old `memory_docs`-only data already synced into UnifiedMemory — separate question, depends on retention story.

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions