feat(pipeline): index-time importance scoring (weighted-degree)#879
Open
petercoxphoto wants to merge 4 commits into
Open
feat(pipeline): index-time importance scoring (weighted-degree)#879petercoxphoto wants to merge 4 commits into
petercoxphoto wants to merge 4 commits into
Conversation
Owner
|
Thanks for splitting this out of #776. Triage: normal-priority feature work for ranking/index quality. Review will focus on score semantics, index-time overhead, properties_json mutation safety, and whether the tests prove stable ordering without making the ranking formula too rigid. This is the right size for a standalone review. |
AC1 tests (tests/test_importance.c) + suite wiring + a no-op pass stub (pass_importance.c) registered as the 7th predump pass in run_predump_passes (after all edge-producing passes). Tests fail against the stub (no importance key persisted) — the red-test boundary. Formula lands in the impl commit. Red evidence: full ASan test-runner run captured at $ARTIFACT_DIR/red-run.log — all 6 importance tests FAIL (test_importance.c:91/167/215/279/331/398). The pre-existing lang_contract ASan abort (vendored ts_runtime lexer UAF) is a baseline flake unrelated to this change. Boundary committed by the orchestrator (recovery: the builder agent authored these files but died on an API overload before committing). Signed-off-by: Peter Cox <info@petercox.ie>
pass_importance.c: weighted-degree Aider model persisted per Function/Method/
Class node as a numeric "importance" key on properties_json (append pattern
from pass_complexity.c). importance = sqrt(num_refs) x priv x generic x
distinct x test_penalty:
num_refs = incoming CALLS + USAGE edges
priv x0.1 leading-underscore
generic x0.1 name defined in >=5 distinct files
distinct x10 snake_case|camelCase and len>=8
test_penalty x0.1 edge-based (TARGET of a TESTS edge, or file is the SOURCE
of a TESTS_FILE edge) -- replaces the filename strstr; a
'test'-substring path with no test edge is NOT penalised.
Weighted-degree only; PageRank deferred (measured decision, builder-notes).
All 10 fixture importance tests pass. Does not touch compute_search_score.
Signed-off-by: Peter Cox <info@petercox.ie>
Real-data verification (full connectors re-index) showed the spec's edge-only TESTS/TESTS_FILE penalty does NOT achieve AC1's exclusion goal: TESTS_FILE has only 15 edges (needs a test-file->prod-file mapping), TESTS never targets a test-resident symbol, and node is_test is stamped on <2% of nodes and none of the test fixtures/doubles. Edge-only left 33/40 of the top ranked symbols as test scaffolding (session_dir, FakeConnectorClient, ...). Fix: penalise when the graph's OWN canonical classifier cbm_is_test_path() (the one pass_tests uses: test_ prefix, _test.<ext>, /tests/, ...) flags the file, OR the symbol is the target of an incoming TESTS edge (prod helpers in non-test files). This is NOT the naive strstr the spec warned against — it leaves src/testutil_helpers.go unpenalised (test-plan DeusData#6 inversion still green). Result after re-index: top-40 = 0 test-file symbols (was 33); named exclusions _run/now/run_hook/client and make_proposals/make_golden/seed_recording all rank >100 (well outside top-40); top is real domain symbols. All 10 unit tests pass. Deviation from the spec's stated mechanism — surfaced to Peter. Signed-off-by: Peter Cox <info@petercox.ie>
Signed-off-by: Peter Cox <info@petercox.ie>
ffcead9 to
cce0f68
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Index-time importance scoring (weighted-degree, Aider-style)
Split out of #776 (now closed) as an atomic feature PR, per @DeusData's request to break that bundle into single-concern PRs. The RSS/backpressure stability fix from that bundle already landed as 610531a; the memory-safety fixes are superseded by upstream's own leak/slab work, so this PR carries only the importance-ranking feature. The
repo_maptool is a separate stacked PR that builds on this.What it does
Adds a 7th predump pass (
pass_importance.c, registered after all edge-producing passes) that persists a numericimportancekey per Function/Method/Class node onproperties_json(append pattern frompass_complexity.c). It does not touchcompute_search_score— it's index-time only.Weighted-degree only; PageRank deferred (measured decision).
Test-penalty note
The original spec used an edge-only TESTS/TESTS_FILE penalty. Real-corpus verification showed that under-fires (TESTS_FILE had ~15 edges;
is_teststamped on <2% of nodes), leaving 33/40 top-ranked symbols as test scaffolding. Switched to the graph's canonicalcbm_is_test_path()classifier (the onepass_testsuses) OR an incoming TESTS edge — not a naivestrstr(it leavestestutil_helpers-style prod helpers unpenalised). After re-index: top-40 = 0 test-file symbols (was 33).Tests
tests/test_importance.c— 10 cases (registration/dispatch, the formula factors, and the test-penalty inversion). Rebased onto currentmain; the only merge work was test-harness wiring (RUN_SELECTED_SUITE).