Skip to content

[codex] Discover projects from transcript metadata#41

Merged
chioarub merged 3 commits into
mainfrom
investigate/project-discovery
Jun 3, 2026
Merged

[codex] Discover projects from transcript metadata#41
chioarub merged 3 commits into
mainfrom
investigate/project-discovery

Conversation

@chioarub

@chioarub chioarub commented Jun 3, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add transcript-backed project discovery from ~/.openclaude/projects/ when ~/.openclaude.json omits projects.
  • Preserve config metadata precedence, fold OpenClaude .claude/worktrees/* transcript roots into their parent project, and reject mismatched cwd rows.
  • Add server route/service coverage for transcript-only projects, encoded-path collisions, symlink safety, missing paths, and large transcript prefix rows.
  • Update docs to describe the expanded read scope and the privacy impact of historical project paths.

Root Cause

OpenClaude Studio built the project selector only from ~/.openclaude.json. OpenClaude can retain valid project transcript directories under ~/.openclaude/projects/ even when those projects are absent from the global config, so those projects did not appear in Studio.

Validation

  • coderabbit review --agent -t uncommitted
  • npm test
  • npm run build
  • npm run lint
  • npm run test:e2e
  • git diff --check

Security and Privacy

  • The change remains read-only and does not mutate OpenClaude config, sessions, logs, project files, tasks, plans, or file-history data.
  • Transcript reads remain bounded and scoped to documented OpenClaude data locations.
  • Symlinked transcript roots/files are skipped, and transcript cwd metadata must match the transcript root before a project is added.
  • Docs now state that transcript-backed discovery can expose historical or missing project paths in the browser.

Purpose and impact
Adds transcript-backed project discovery to OpenClaude Studio so the server surfaces valid projects found in ~/.openclaude/projects/ when they are not listed in ~/.openclaude.json. Visible effect: transcript-only projects can appear in the project selector without manual config edits.

Technical changes

  • readProjectSummariesWithDiagnostics now merges configured projects from ~/.openclaude.json with transcript-discovered projects found under paths.projectsDir, using bounded root counts, recursion depth, per-root file limits, and per-file byte limits.
  • Transcript parsing reads .jsonl session rows line-by-line (skips empty/malformed lines and meta records), maps record cwd to canonical project paths, aggregates per-project last-used timestamps and session usage (inputTokens, outputTokens, lastSessionId), and preserves configured project metadata when present.
  • Safety and correctness: skips symlinked transcript roots/files, rejects transcript rows whose cwd does not match the encoded transcript root, folds .claude/worktrees/* transcript roots into their parent project, truncates/limits large rows, and reports diagnostics for unreadable, truncated, or missing transcript roots/files.
  • Utilities: adds discovery limit constants and safeLstat (returns null for ENOENT); minor import adjustments.
  • Tests: added and expanded HTTP and service tests covering transcript-only discovery, encoded-path collisions, symlink handling, missing/unreadable paths, large transcript rows, preservation of config metadata, and related edge cases.
  • Docs: updates to README, server README, architecture, privacy-and-redaction, and troubleshooting to reflect the new discovery scope and diagnostics.

Compatibility, security, privacy, release notes

  • Backwards compatible and read-only: no mutation of configs, sessions, logs, project files, tasks, plans, or history.
  • Discovery is constrained to documented OpenClaude locations and enforces safety checks; symlinked roots/files are skipped and transcript cwd must match the transcript root.
  • Privacy: transcript-backed discovery can reveal historical or now-missing project paths in the browser; documentation and a “Privacy At A Glance” note were added to call this out.
  • No migration or deployment steps required beyond normal release; include updated docs in release notes.

Validation
Tests and recommended validation commands included in the PR: coderabbit review --agent -t uncommitted; npm test; npm run build; npm run lint; npm run test:e2e; git diff --check.

Review Change Stack

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4d7fcdef-dc45-4bda-ae3a-2779fa9f7364

📥 Commits

Reviewing files that changed from the base of the PR and between a9382c5 and d8c3102.

📒 Files selected for processing (2)
  • apps/server/src/services/openclaudeData.test.ts
  • apps/server/src/services/openclaudeData.ts

📝 Walkthrough

Walkthrough

Adds transcript-backed project discovery: server scans bounded transcript JSONL metadata under ~/.openclaude/projects/, aggregates cwd/usage/session/timestamp data, merges with configured projects, computes active projects by recency, exposes diagnostics, and updates tests and docs.

Changes

Transcript-based project discovery

Layer / File(s) Summary
Core transcript discovery implementation
apps/server/src/services/openclaudeData.ts
Refactors readProjectSummariesWithDiagnostics to combine configured projects with transcript-discovered projects. Adds bounded discovery constants, JSONL scanning/parsing with per-file byte limits, per-project/session aggregation, sortProjectSummaries to set active by recency, and safeLstat for resilient existence checks.
Unit tests for transcript discovery
apps/server/src/services/openclaudeData.test.ts
Adds tests covering discovery from transcript-only data, large/earlier rows, truncated files with warn diagnostics, config precedence retention, worktree folding and spoof handling, encoded-path collisions, cwd mismatch filtering, symlink traversal prevention, unreadable-root warnings, missing-path exists: false errors, plus a jsonl fixture helper.
HTTP API integration test
apps/server/src/http/server.test.ts
Adds an integration test asserting /api/projects returns a transcript-discovered project with derived branch and transcript usage/session fields.
Documentation
README.md, apps/server/README.md, docs/architecture.md, docs/privacy-and-redaction.md, docs/troubleshooting.md
Clarifies that project discovery uses both ~/.openclaude.json and bounded transcript metadata under ~/.openclaude/projects/, and notes that historical project paths from transcripts may appear in the browser even if missing on disk.

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 5

❌ Failed checks (5 inconclusive)

Check name Status Explanation Resolution
Api Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Local Data Safety ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Release Readiness ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Test Coverage ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Public Hygiene ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title '[codex] Discover projects from transcript metadata' clearly and concisely describes the main operational change: enabling project discovery from transcript metadata, which aligns directly with the primary objective of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@chioarub chioarub marked this pull request as ready for review June 3, 2026 10:10

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e33e234e-4e8d-41e6-b129-e6160f8460e7

📥 Commits

Reviewing files that changed from the base of the PR and between 64af500 and 24c5de7.

📒 Files selected for processing (8)
  • README.md
  • apps/server/README.md
  • apps/server/src/http/server.test.ts
  • apps/server/src/services/openclaudeData.test.ts
  • apps/server/src/services/openclaudeData.ts
  • docs/architecture.md
  • docs/privacy-and-redaction.md
  • docs/troubleshooting.md

Comment thread apps/server/src/services/openclaudeData.ts
Comment thread apps/server/src/services/openclaudeData.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/server/src/services/openclaudeData.ts (1)

350-352: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten worktree-root matching; the prefix fallback accepts spoofed roots.

isTranscriptRootForCwd() currently treats any directory named ${encodeProjectPath(projectPath)}--claude-worktrees-* as valid, even when the row cwd is only the parent project path. That weakens the “cwd must match transcript root” check and lets unrelated roots under ~/.openclaude/projects claim a project. For real worktree transcripts, transcriptRootName === encodeProjectPath(cwd) already matches the encoded worktree path, so the prefix branch can be removed and covered with a regression test.

Suggested fix
 function isTranscriptRootForCwd(
   transcriptRootName: string,
   cwd: string,
   projectPath: string,
 ): boolean {
   const encodedProjectPath = encodeProjectPath(projectPath);
   return (
     transcriptRootName === encodeProjectPath(cwd) ||
-    transcriptRootName === encodedProjectPath ||
-    transcriptRootName.startsWith(`${encodedProjectPath}--claude-worktrees-`)
+    transcriptRootName === encodedProjectPath
   );
 }

Based on learnings: pass only if file access stays inside expected OpenClaude data locations and fail on expanded read scope without a safety explanation.

Also applies to: 402-407


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 960f1785-a29f-43f7-9be1-aca078ca738e

📥 Commits

Reviewing files that changed from the base of the PR and between 24c5de7 and a9382c5.

📒 Files selected for processing (2)
  • apps/server/src/services/openclaudeData.test.ts
  • apps/server/src/services/openclaudeData.ts

Comment thread apps/server/src/services/openclaudeData.ts
@chioarub chioarub merged commit 499e075 into main Jun 3, 2026
5 checks passed
@chioarub chioarub deleted the investigate/project-discovery branch June 3, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant