Skip to content

Docs/fix imprecisions#19

Merged
joseph-isaacs merged 8 commits into
developfrom
docs/fix-imprecisions
Jun 15, 2026
Merged

Docs/fix imprecisions#19
joseph-isaacs merged 8 commits into
developfrom
docs/fix-imprecisions

Conversation

@gargiulofrancesco

Copy link
Copy Markdown
Collaborator

No logic changes. Two parts:

Docs Cleanup

  • Corrected a few comments that were out of date or wrong (the code_offsets rationale, the read-padding bound in the spec, a stale AVX-512 mention in a test).
  • Removed dead references (PUBLIC_API.md, a hardcoded local path).
  • Renamed binary-format.mdinterchange-format.md to match what it actually describes.
  • Fixed some broken intra-doc links.

CI

  • Added a step that fails the build on broken doc links, so they can't rot again.

gargiulofrancesco and others added 8 commits June 7, 2026 12:39
The spec describes the shared in-memory interchange form that independent
OnPair implementations exchange — not this library's concrete binary/on-disk
layout, which the doc explicitly puts out of scope. Rename the file to match
and reword the README section accordingly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The codes for a row decode independently — the encoder clips every match at
the row end, so a token never spans a row boundary. code_offsets are needed
not for boundary-spanning tokens but because the code stream is a flat
concatenation with no in-band row delimiter, matching the interchange-format
spec (§4). Fix the reason in both the Column field doc and encode_strings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
§6 stated `dict_bytes_len >= o_N + MAX_TOKEN_SIZE`, but `o_N` is the
end sentinel (= logical length L, the first padding byte), not the last
token's offset. The decoder reads MAX_TOKEN_SIZE bytes from the highest
*token* offset, `o_{N-1}`, so the bound is `o_{N-1} + MAX_TOKEN_SIZE` —
matching §3.1, §5, and `validate_dictionary`. As written, §6 over-required
by `length(last token)` and would reject conformant minimally-padded
columns (e.g. a zero-padded full-width last token).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- parser.rs: drop intra-doc links from the public `train`/`parse` docs to
  the crate-private `validate_offsets` (they rendered as dead links on
  docs.rs and failed `cargo doc -D warnings`); inline the offset-validity
  condition instead so the docs stay self-contained.
- fat.rs: rewrite the module doc to match the current single-layout reality;
  the `[super::DecodeEntry]` / `[super::plan]` items it linked were removed.
- benches/tpch.rs: drop the reference to the non-existent PUBLIC_API.md.

cargo doc now passes under -D rustdoc::broken_intra_doc_links and
-D rustdoc::private_intra_doc_links.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These links never resolved; they were invisible only because rustdoc
checks links solely in the items it documents, and these live in (or
point across) private modules:

- column.rs: `crate::decompress` is ambiguous — a `decompress` module and
  a re-exported `decompress` fn share the name. Use the `fn@`
  disambiguator, which selects the fn and still renders as a bare path
  (the `()` suffix would resolve too but leaks parens into the text).
- lpm.rs: `new` / `from_dictionary` are associated fns, not names in
  scope, so the bare links were unresolved. Qualify them with `Self::`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Nothing currently guards documentation: CI never runs `cargo doc`, and
without a [lints.rustdoc] table a broken intra-doc link only warns. Add
the lint table (deny broken_intra_doc_links + private_intra_doc_links)
so `cargo doc` fails locally too, and a CI step that runs it. Use
--document-private-items so links inside private modules are checked,
which is where the broken links this branch fixed actually lived.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codspeed-hq

codspeed-hq Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 4 improved benchmarks
❌ 4 regressed benchmarks
✅ 24 untouched benchmarks
⏩ 2 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime decompress_all[("l_comment", 16)] 19 ms 24.7 ms -23.08%
WallTime decompress_all[("o_comment", 16)] 17.3 ms 21.4 ms -19.54%
WallTime decompress_all[("o_comment", 12)] 17 ms 20.1 ms -15.41%
WallTime decompress_all[("l_comment", 12)] 18.9 ms 21.6 ms -12.82%
WallTime decompress_all[12] 877.9 µs 714.3 µs +22.89%
WallTime decompress_all[16] 881.1 µs 718 µs +22.72%
WallTime decompress_all[("p_name", 12)] 1.4 ms 1.2 ms +19.34%
WallTime decompress_all[("p_name", 16)] 1.2 ms 1.1 ms +10.79%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing docs/fix-imprecisions (46e0cd1) with develop (e3f73f6)

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@joseph-isaacs joseph-isaacs merged commit e38908c into develop Jun 15, 2026
9 of 12 checks passed
@gargiulofrancesco gargiulofrancesco deleted the docs/fix-imprecisions branch June 15, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants