Name	Name	Last commit message	Last commit date
parent directory ..
.env.example	.env.example
.gitignore	.gitignore
README.md	README.md
import-chatgpt.py	import-chatgpt.py
metadata.json	metadata.json
requirements.txt	requirements.txt

ChatGPT Conversation Import

Import your ChatGPT history into Open Brain as curated, searchable thoughts — not raw transcripts.

What It Does

Takes your ChatGPT data export, filters out trivial conversations (poems, one-liners, image requests), uses an LLM to distill each remaining conversation into 1-3 standalone thoughts, and loads them into your Open Brain with vector embeddings and metadata. The result is semantically searchable knowledge extracted from every meaningful ChatGPT conversation you've ever had.

Prerequisites

Working Open Brain setup (guide)
Your ChatGPT data export (Settings → Data Controls → Export Data in ChatGPT)
Python 3.10+
Your Supabase project URL and service role key (from your credential tracker)
OpenRouter API key (for LLM summarization and embedding generation)

Credential Tracker

Copy this block into a text editor and fill it in as you go.

CHATGPT CONVERSATION IMPORT -- CREDENTIAL TRACKER
--------------------------------------

FROM YOUR OPEN BRAIN SETUP
  Supabase Project URL:  ____________
  Supabase Secret key:   ____________
  OpenRouter API key:    ____________

FILE LOCATION
  Path to ChatGPT export:  ____________

--------------------------------------

Steps

1. Export your data from ChatGPT

Go to ChatGPT → Settings → Data Controls → Export Data. You'll receive an email with a download link within a few minutes. Download the zip file.

2. Clone this recipe folder

# From the OB1 repo root
cd recipes/chatgpt-conversation-import

Or copy the files (import-chatgpt.py, requirements.txt) into any working directory.

3. Install dependencies

pip install -r requirements.txt

This installs requests — the only external dependency.

4. Set your environment variables

export SUPABASE_URL=https://YOUR_PROJECT_REF.supabase.co
export SUPABASE_SERVICE_ROLE_KEY=your-service-role-key-here
export OPENROUTER_API_KEY=sk-or-v1-your-key-here

All three values come from your credential tracker. You can also copy .env.example to .env and fill it in, then run export $(cat .env | xargs).

5. Do a dry run first

python import-chatgpt.py path/to/chatgpt-export.zip --dry-run --limit 10

This parses, filters, and summarizes 10 conversations without writing anything to your database. Review the output to see what would be imported and how the LLM distills each conversation.

6. Run the full import

python import-chatgpt.py path/to/chatgpt-export.zip

The script will:

Extract conversations from the zip (or directory)
Filter out trivial conversations (see How It Works below)
Summarize each remaining conversation into 1-3 standalone thoughts via LLM
Generate a vector embedding for each thought
Insert each thought into your thoughts table in Supabase

Progress prints to the console as it runs. A sync log (chatgpt-sync-log.json) tracks which conversations have been imported, so you can safely re-run the script after future exports without duplicating data.

7. Verify in your database

Open your Supabase dashboard → Table Editor → thoughts. You should see new rows with:

content: prefixed with [ChatGPT: title | date]
metadata: includes source: "chatgpt", conversation title, date, and URL
embedding: a 1536-dimension vector

8. Test a search

In any MCP-connected AI (Claude Desktop, ChatGPT, etc.), ask:

Search my brain for topics I discussed with ChatGPT about [something you know you talked about]

Expected Outcome

After a full import, your thoughts table contains distilled knowledge from every non-trivial ChatGPT conversation. Each thought is a standalone statement (not a raw transcript) that makes sense without the original conversation context.

From a real production run with ~2 years of ChatGPT history:

Metric	Value
Conversations scanned	741
Filtered as trivial	437 (59%)
Processed	304
Thoughts generated	589
Estimated API cost	$0.08

The filtering is aggressive by design — most ChatGPT conversations are throwaway Q&A. The script keeps only conversations with enough substance to produce lasting knowledge.

How It Works

Three-stage pipeline

Stage 1: Filtering — Each conversation passes through 6 filters before it reaches the LLM:

Filter	What it catches
Already imported	Conversations processed in a previous run (sync log)
Too few messages	< 4 messages total (not enough substance)
Too little text	< 20 words of user text
Title patterns	Poems, jokes, image generation, translations, bedtime stories
Do-not-remember	Conversations you marked as "don't remember" in ChatGPT
Date range	Outside your `--after` / `--before` window

Stage 2: Summarization — Surviving conversations go to an LLM (gpt-4o-mini by default via OpenRouter) with a carefully tuned prompt. The LLM extracts 1-3 standalone thoughts per conversation, focusing on:

Decisions and reasoning
People and relationships
Strategies and architectural choices
Lessons learned and preferences

The LLM is instructed to return empty for conversations that are just generic Q&A, coding help without decisions, or creative tasks.

Stage 3: Ingestion — Each thought gets a vector embedding (text-embedding-3-small, 1536 dimensions) and is inserted into your thoughts table with metadata linking back to the original ChatGPT conversation.

Deduplication

The sync log (chatgpt-sync-log.json) stores a hash of each processed conversation. Re-running the script after a new ChatGPT export only processes new conversations. The hash is based on conversation title + creation timestamp.

Options Reference

Flag	Description	Default
`--dry-run`	Parse, filter, summarize — but don't write to database	Off
`--after YYYY-MM-DD`	Only process conversations created after this date	None
`--before YYYY-MM-DD`	Only process conversations created before this date	None
`--limit N`	Max conversations to process (0 = unlimited)	0
`--model openrouter`	LLM backend for summarization: `openrouter` or `ollama`	`openrouter`
`--ollama-model NAME`	Which Ollama model to use (requires `--model ollama`)	`qwen3`
`--raw`	Skip LLM summarization, ingest user messages as-is	Off
`--verbose`	Print full thought text during processing	Off
`--report FILE`	Write a markdown report of everything imported	None
`--ingest-endpoint`	Use custom `INGEST_URL`/`INGEST_KEY` instead of Supabase direct insert	Off

Using a local LLM (free, private)

If you don't want to send your conversations to OpenRouter, use Ollama for local summarization:

# Install Ollama and pull a model
ollama pull qwen3

# Run with local LLM
python import-chatgpt.py export.zip --model ollama --ollama-model qwen3

Note: embeddings still use OpenRouter (text-embedding-3-small) for Supabase direct insert mode. Only the summarization step runs locally.

Cost Estimates

All costs are via OpenRouter at current pricing.

Component	Model	Cost
Summarization	gpt-4o-mini	~$0.15/1M input + $0.60/1M output
Embeddings	text-embedding-3-small	~$0.02/1M tokens

Typical costs by export size:

Export size	Processed	Thoughts	Est. cost
100 conversations	~40	~80	~$0.01
500 conversations	~200	~400	~$0.05
1000 conversations	~400	~800	~$0.10
5000 conversations	~2000	~4000	~$0.50

These assume ~60% of conversations are filtered as trivial and ~2 thoughts per conversation.

Troubleshooting

Issue: conversations.json not found in the export Solution: ChatGPT exports come as a zip file. Make sure you've either (a) pointed the script at the zip file directly (python import-chatgpt.py export.zip), or (b) unzipped it and pointed at the directory. The script handles both formats automatically, including the multi-file format (conversations-000.json, conversations-001.json, etc.) used in large exports.

Issue: OPENROUTER_API_KEY required error Solution: Make sure you've exported the environment variable in your current terminal session: export OPENROUTER_API_KEY=sk-or-v1-.... Environment variables don't persist between terminal windows.

Issue: Import is very slow Solution: Each conversation requires one LLM call (summarization) and 1-3 embedding calls (one per thought). For 500+ conversations, expect 15-30 minutes. Use --limit 10 to test first, then run the full import. Progress prints to the console so you can see it working.

Issue: Most conversations return "No thoughts extracted" Solution: This is expected behavior. The LLM is deliberately selective — it only extracts knowledge worth retrieving months from now. Generic Q&A, coding help, and creative tasks get empty summaries. Use --raw if you want to import everything without filtering.

Issue: Some conversations are missing after import Solution: Conversations with fewer than 4 messages or fewer than 20 words of user text are filtered automatically. Title patterns like "poem", "joke", "image of" are also filtered. Run with --dry-run --verbose to see what's being filtered and why.

Issue: Want to re-import after a new ChatGPT export Solution: Just run the script again pointing at your new export. The sync log (chatgpt-sync-log.json) tracks which conversations have been processed. Only new conversations will be imported. If you want to start fresh, delete chatgpt-sync-log.json.

Issue: Failed to generate embedding errors Solution: Check that your OpenRouter API key is valid and has credits. Go to openrouter.ai/credits to verify your balance. The embedding model (text-embedding-3-small) costs $0.02 per million tokens — even a large import costs pennies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

ChatGPT Conversation Import

What It Does

Prerequisites

Credential Tracker

Steps

1. Export your data from ChatGPT

2. Clone this recipe folder

3. Install dependencies

4. Set your environment variables

5. Do a dry run first

6. Run the full import

7. Verify in your database

8. Test a search

Expected Outcome

How It Works

Three-stage pipeline

Deduplication

Options Reference

Using a local LLM (free, private)

Cost Estimates

Troubleshooting

FilesExpand file tree

chatgpt-conversation-import

Directory actions

More options

Directory actions

More options

Latest commit

History

chatgpt-conversation-import

Folders and files

parent directory

README.md

ChatGPT Conversation Import

What It Does

Prerequisites

Credential Tracker

Steps

1. Export your data from ChatGPT

2. Clone this recipe folder

3. Install dependencies

4. Set your environment variables

5. Do a dry run first

6. Run the full import

7. Verify in your database

8. Test a search

Expected Outcome

How It Works

Three-stage pipeline

Deduplication

Options Reference

Using a local LLM (free, private)

Cost Estimates

Troubleshooting