Skip to content

Handle cross-thread slab frees#782

Open
SS-42 wants to merge 5 commits into
DeusData:mainfrom
SS-42:fix/slab-cross-thread-free
Open

Handle cross-thread slab frees#782
SS-42 wants to merge 5 commits into
DeusData:mainfrom
SS-42:fix/slab-cross-thread-free

Conversation

@SS-42

@SS-42 SS-42 commented Jul 2, 2026

Copy link
Copy Markdown

Summary

This fixes slab allocator ownership for tree-sitter allocations that are freed on a different parser thread than the one that allocated them.

The allocator now keeps a global slab-page registry. Each page still has an owner thread for reuse, but cross-thread frees are recognized as slab pointers instead of falling through to plain free(). Reclaim/destroy detaches pages with no live chunks and retires pages that still have live chunks, freeing them when the final chunk is returned.

The slab payload is explicitly aligned to max_align_t, and resolve workers now tear down their thread-local parser/slab state before exiting so registered pages do not keep an owner pointer into dead TLS.

Observed environment: macOS 26.5.2 (25F84), Darwin 25.5.0 arm64, Apple clang 21.0.0.

Crash evidence

Local macOS DiagnosticReports show in-pipeline watcher/indexer aborts with SIGABRT / ___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED under tree-sitter frames, including:

  • 2026-07-02 19:31:58 +0300, PID 74258: ts_stack_delete -> ts_parser_delete -> cbm_destroy_thread_parser -> extract_worker -> cbm_parallel_extract -> cbm_pipeline_run -> watcher_index_fn -> cbm_watcher_poll_once -> cbm_watcher_run -> watcher_thread
  • 2026-07-02 19:31:59 +0300, PID 92286: _realloc -> ts_subtree_compress -> ts_parser_parse -> cbm_extract_file -> extract_worker -> cbm_parallel_extract -> cbm_pipeline_run -> watcher_index_fn -> cbm_watcher_poll_once -> cbm_watcher_run -> watcher_thread
  • 2026-07-03 18:48:33 +0300, PID 96664: stack_node_release -> ts_stack_renumber_version -> ts_parser_parse -> cbm_extract_file -> extract_worker -> cbm_parallel_extract -> cbm_pipeline_run_incremental -> cbm_pipeline_run -> watcher_index_fn -> cbm_watcher_poll_once -> cbm_watcher_run -> watcher_thread

The 2026-07-03 reports came from local build UUID 459D6782-328E-308C-BDFB-A117FEC793ED. That build is useful provenance for the same watcher/tree-sitter failure class, but it is not byte-identical to this PR head: this PR head additionally has alignas(max_align_t) and resolve-worker terminal slab teardown.

Tests

  • make -f Makefile.cbm build/c/test-runner CC=clang CXX=clang++
  • ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem (34 passed)
  • ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem pipeline (243 passed)
  • red-first allocator regressions cover:
    • cross-thread free of a slab chunk
    • reclaim while a foreign thread still holds a live chunk
    • destroy while a foreign thread still holds a live chunk

Local indexer timing smoke

Same macOS arm64 host, same checkout, fresh CBM_CACHE_DIR per run, cli index_repository --mode fast:

CBM_WORKERS main PR head result
1 15.238s 14.949s PR -1.9%
2 5.410s 5.329s PR -1.5%
4 3.381s 3.307s PR -2.2%
8 5.322s 5.536s PR +4.0%
10 8.244s 9.246s PR +12.2%

A separate default-worker smoke on the same checkout produced main 8.700s / 8.537s and PR head 13.675s / 9.116s; the first PR run appears cold/outlier, while the second is about +7% versus main. The global registry remains a deliberate correctness/performance tradeoff in this version rather than a proven no-cost fast-path change.

@SS-42 SS-42 requested a review from DeusData as a code owner July 2, 2026 18:42
@SS-42 SS-42 force-pushed the fix/slab-cross-thread-free branch 3 times, most recently from 5f891fe to b391b01 Compare July 2, 2026 19:22
@DeusData DeusData added bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. labels Jul 3, 2026
@DeusData

DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Thanks for the allocator fix. Triage: high-priority memory-safety/stability PR.

Review will be fairly strict here: cross-thread slab ownership is low-level and can create subtle corruption if half-fixed. We will look for ASan/UBSan coverage, cross-thread free regression coverage, and confirmation that normal same-thread allocation/free behavior is unchanged.

@DeusData

DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Thank you — this is a serious, well-written PR and the two unit tests are honest red-first reproductions of the allocator invariant (cross-thread free of a slab chunk falls through to plain free() of an interior pointer; foreign-live chunk across reclaim). Before we can take a redesign of this hot path, though, we need two things: (1) Evidence of an in-pipeline trigger. We traced the current pipeline and found parse-thread==free-thread holds everywhere (extract workers free trees before caching; resolve workers re-parse on their own thread; the sequential path is single-threaded) — so as far as we can tell this hardens a latent class rather than fixing an observed in-tree crash. Could you share the macOS malloc-diagnostics stack/repro that motivated it? If there IS a real path, that changes the calculus entirely. (2) Benchmarks. The fix replaces a lock-free O(1) TLS fast path with a global spinlock plus a linear global-page-registry scan on every small tree-sitter alloc/free across all workers — on large indexes that's exactly our hottest loop, and we gate on an indexer baseline (~4.8M nodes / 218s single-run). We'd want numbers showing parity, or a redesign that keeps the fast path lock-free (e.g. per-page atomic remote-free queue drained by the owner, or aligned pages giving O(1) pointer→page so no global scan). One residual within the current design: page->owner points into a thread's TLS and can dangle if a thread exits with registered pages (resolve workers have no terminal destroy) — the fix converts main's leak-corner into a potential write-into-freed-TLS. Also a small heads-up: cbm_slab_reset_thread silently changes semantics (rebuild→retire/free); no callers today, but worth documenting. Happy to work through this with you — allocator correctness matters a lot here, and your unit-level guards are exactly the right foundation. Thanks!

@SS-42

SS-42 commented Jul 3, 2026

Copy link
Copy Markdown
Author

Updated in head a1e5aa7.

Changes since the review comment:

  • added terminal parser/slab teardown for resolve_worker, so registered pages do not retain an owner pointer into dead worker TLS;
  • added slab_destroy_with_foreign_live_chunk_is_safe alongside the existing cross-thread-free and reclaim-with-foreign-live-chunk regressions;
  • refreshed the PR body with the local macOS DiagnosticReports watcher/indexer stack evidence, including the build UUID caveat;
  • added a local CBM_WORKERS timing table. The 8/10-worker numbers show the global registry path is still a visible correctness/performance tradeoff, not a no-cost fast-path change.

Fresh local checks on macOS arm64:

  • ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem -> 34 passed
  • ASAN_OPTIONS=detect_leaks=0 ./build/c/test-runner mem pipeline -> 243 passed

SS-42 added 3 commits July 3, 2026 20:34
Signed-off-by: SS-42 <noreply@incogni.to>
Signed-off-by: SS-42 <noreply@incogni.to>
Signed-off-by: SS-42 <noreply@incogni.to>
@SS-42 SS-42 force-pushed the fix/slab-cross-thread-free branch from 48008c5 to a1e5aa7 Compare July 3, 2026 17:34
@DeusData

DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Heads-up: the security / codeql-gate failure on this PR is not your change — it was a repo-side gate bug (the check scanned only the 5 newest CodeQL runs and lost track of PRs on a busy queue). That's fixed on main now (#820). Any push to your branch — or simply clicking GitHub's Update branch button — will trigger a fresh run with the fixed gate and it should go green. Sorry for the noise, and thanks for your patience!

@DeusData

DeusData commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Thank you for the thorough follow-up — the terminal resolve_worker teardown closes the dangling-owner residual, the destroy-with-foreign-live-chunk regression is exactly the missing guard, and we appreciate the honesty of the CBM_WORKERS table: your own 8/10-worker numbers confirm the review's core concern that the global-registry path is a visible cost on the hottest loop. We're re-reviewing the full delta now; the remaining open question is purely the design tradeoff — accept the measured cost for the correctness hardening, or move the remote-free handling to a per-page atomic queue so the alloc fast path stays lock-free. We'll come back with a concrete direction shortly rather than leaving this hanging. Thanks again for engaging so constructively!

The previous GitHub-hosted macos-14 job failed in edge_types_probe while the same PR head passed repeated local targeted runs. This retriggers the matrix because rerunning failed upstream jobs requires repository admin permissions.

Signed-off-by: SS-42 <noreply@incogni.to>
@SS-42 SS-42 force-pushed the fix/slab-cross-thread-free branch from 2f09ec8 to 825b9fa Compare July 4, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. stability/performance Server crashes, OOM, hangs, high CPU/memory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants