docs: add real-world bash/ops-tooling repo example to Performance#747
docs: add real-world bash/ops-tooling repo example to Performance#747saddestmartian wants to merge 1 commit into
Conversation
The existing Performance section benchmarks large polyglot application codebases (Linux kernel, Django). Bash already scores in the Excellent parsing tier under Language Support, but there's no worked example of the agentic token-efficiency payoff on a repo shape that's mostly shell scripts, YAML/JSON config, and Markdown docs rather than a typical multi-language application. Adds one real session's measurement: a single search_code call vs a grep-based sub-agent fan-out on an internal dev-tooling repo, clearly labeled as one real-world data point rather than a controlled benchmark. Signed-off-by: saddestmartian Signed-off-by: saddestmartian <saddestmartian@gmail.com>
|
Hey, looks good. Can you remove the last line? Basically embedd the insights naturally :) |
|
Thank you, @saddestmartian — a genuinely thoughtful contribution, and you framed it exactly right (honest that it's a single real-world session, not a controlled benchmark). The technical points hold up too: Bash is an Excellent-tier language here, and shell functions are indexed as first-class graph nodes. We're keeping the README Performance section limited to benchmarks we can reproduce and stand behind end-to-end (the M3 Pro figures, the measured token-efficiency comparison) — readers treat that section as authoritative, and we can't reproduce a third-party session's exact numbers. That's the only reason we're not merging; it's about keeping that section reproducible, not the quality of your work. If you'd like to share the data point, a GitHub Discussion would be a great home for it and genuinely useful to teams evaluating the tool for ops/bash-heavy repos. Really appreciate the effort. 🙏 |
What does this PR do?
Adds a small "Real-World Example" subsection to the Performance section of the README.
The existing Performance benchmarks (Linux kernel, Django) are measured on large polyglot application codebases. Bash already scores in the "Excellent" parsing tier under Language Support, but there's no worked example of what that translates to for a different repo shape: infra/ops-tooling repos that are mostly shell scripts, YAML/JSON config, and Markdown docs rather than a typical multi-language application.
This adds one real session's measurement on an internal dev-tooling repo (bash + JSON + Markdown, ~14.9k indexed nodes / ~20.5k edges): a single
search_codecall reproducing a config-file consumer inventory in ~375ms vs. a grep-based sub-agent fan-out taking ~131s across 12 tool calls for the same result. It's explicitly labeled as one real-world data point, not a controlled multi-trial benchmark, to keep it honest alongside the hardware-benchmarked numbers above it.Docs-only change — no source touched.
Checklist
git commit -s) — required, CI rejectsunsigned commits (DCO, see CONTRIBUTING.md)
make -f Makefile.cbm test) — N/A, docs-only change, no C source touchedmake -f Makefile.cbm lint-ci) — N/A, docs-only change, no C source touched