perf(index): bound parallel source retention for low RAM#685
perf(index): bound parallel source retention for low RAM#685nguyentamdat wants to merge 1 commit into
Conversation
Limit source retention to a fraction of the memory budget, skip retention when cross-file LSP is disabled or incremental indexing cannot use it, and fail cleanly on extract allocation failures. Verify: make -f Makefile.cbm test Signed-off-by: nguyentamdat <nguyentamdat@gmail.com>
4c38514 to
08319e8
Compare
|
Huge thanks for opening this PR and for the work you put into it. The maintainer shop is currently full, so this may sit for a bit before it gets a proper review. We will come back to this as soon as possible with real feedback; I wanted to make sure it did not sit unacknowledged in the meantime. |
|
Quick status note: this PR is one of four open memory/RAM-policy changes (#833, #752, #586, #685) that we've reviewed individually and found genuinely complementary — so rather than merging them piecemeal, we're doing a combined design pass over the whole memory policy (explicit override, host-tiered defaults, retention bounds, post-index release, and the Windows auto-sync driver in #841) and will respond here with a concrete direction shortly. Your work is very much part of that plan — thanks for your patience! |
…ck (low-RAM peak RSS) Distilled from DeusData#685 (nguyentamdat) rebased onto current main, plus two research-driven refinements and a genuine reproduce-first guard. The parallel extract retains each file's source text so the fused cross-file LSP resolve can re-parse it. That retention is transient but a peak-RSS driver. On main the caps are flat (100 MiB/file, 2 GiB total) and a file over the cap is silently unretained AND its cross-file resolution is skipped -- a graph-quality gap. This change bounds retention AND keeps every cross-file edge. - Source-text cap as a FLOOR, not just a ceiling: retention total defaults to min(cbm_mem_budget()/8, 1 GiB), per-file min(32 MiB, total). Following the rust-analyzer memory model, the RAM-derived default is clamped to a small absolute ceiling so a huge-RAM host does not hold tens of GB it would re-read cheaply. Both caps env-overridable via CBM_RETAIN_TOTAL_MB / CBM_RETAIN_PER_FILE_MB (limits.c convention); ceilings bound only the auto-derived default, never an explicit operator/caller choice. A dropped file emits one index.retain_capped WARN per run. - Bounded re-read fallback (the correctness guarantee): resolve_worker re-reads an unretained file's source on demand (bounded, freed immediately) instead of skipping resolution, wired at every cross-LSP site that consumes source. The cap now only trades retained RAM for a bounded re-read, never a lost edge. - cbm_parallel_extract_ex + opts struct (cbm_parallel_extract is now a wrapper passing NULL -> env-derived defaults); malloc/calloc NULL-check hardening. Reproduce-first: parallel_cross_file_reread_preserves_unretained_edges uses a Java<->Kotlin pair whose cross-file lsp edges are genuinely source-dependent; the edges are lost when the caller is unretained and the fallback is absent (RED), present with it (GREEN), with a retained CONTROL scenario proving non-vacuity. DeusData#685's original Python red test was a false guard (per-file py_lsp already resolves those calls) and is replaced. Peak-bound guards (retained_bytes <= total_cap; retain_sources=false retains nothing) in test_mem.c. Verify: make -f Makefile.cbm cbm && make -f Makefile.cbm lint-ci; test-runner parallel pipeline incremental py_lsp ts_lsp java_lsp kotlin_lsp c_lsp cs_lsp go_lsp rust_lsp mem -> 2323 passed. Co-authored-by: nguyentamdat <nguyentamdat@gmail.com> Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
|
Thank you — your core design was right and we carried it over the line: budget-derived retention bounds plus the bounded re-read fallback so a file over the cap still gets its cross-file resolution instead of silently losing edges. Landed as 6404747 (PR #854) with you credited as co-author. Two refinements from a rust-analyzer memory-model study went in on top: the cap is source-text-specific and modest (a cap is a floor as well as a ceiling — a large RAM-derived cap would just hold that much), with CBM_RETAIN_TOTAL_MB / CBM_RETAIN_PER_FILE_MB env knobs; and while building it we found your original Python red test was a false guard (the per-file py_lsp already resolved those calls, so it passed without the fix) and replaced it with a genuine Java↔Kotlin cross-file fixture that's actually red on the unfixed path. Closing in favor of the distill — this is the retention layer of the #832 memory work; thanks for the solid foundation! |
Summary
retain_sources=falseVerification
make -f Makefile.cbm test->5728 passedNotes
Split out from #684 so the low-RAM indexing change is reviewed independently from the JVM callgraph fix.