Docs/fix imprecisions#19
Conversation
The spec describes the shared in-memory interchange form that independent OnPair implementations exchange — not this library's concrete binary/on-disk layout, which the doc explicitly puts out of scope. Rename the file to match and reword the README section accordingly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The codes for a row decode independently — the encoder clips every match at the row end, so a token never spans a row boundary. code_offsets are needed not for boundary-spanning tokens but because the code stream is a flat concatenation with no in-band row delimiter, matching the interchange-format spec (§4). Fix the reason in both the Column field doc and encode_strings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
§6 stated `dict_bytes_len >= o_N + MAX_TOKEN_SIZE`, but `o_N` is the
end sentinel (= logical length L, the first padding byte), not the last
token's offset. The decoder reads MAX_TOKEN_SIZE bytes from the highest
*token* offset, `o_{N-1}`, so the bound is `o_{N-1} + MAX_TOKEN_SIZE` —
matching §3.1, §5, and `validate_dictionary`. As written, §6 over-required
by `length(last token)` and would reject conformant minimally-padded
columns (e.g. a zero-padded full-width last token).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- parser.rs: drop intra-doc links from the public `train`/`parse` docs to the crate-private `validate_offsets` (they rendered as dead links on docs.rs and failed `cargo doc -D warnings`); inline the offset-validity condition instead so the docs stay self-contained. - fat.rs: rewrite the module doc to match the current single-layout reality; the `[super::DecodeEntry]` / `[super::plan]` items it linked were removed. - benches/tpch.rs: drop the reference to the non-existent PUBLIC_API.md. cargo doc now passes under -D rustdoc::broken_intra_doc_links and -D rustdoc::private_intra_doc_links. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
These links never resolved; they were invisible only because rustdoc checks links solely in the items it documents, and these live in (or point across) private modules: - column.rs: `crate::decompress` is ambiguous — a `decompress` module and a re-exported `decompress` fn share the name. Use the `fn@` disambiguator, which selects the fn and still renders as a bare path (the `()` suffix would resolve too but leaks parens into the text). - lpm.rs: `new` / `from_dictionary` are associated fns, not names in scope, so the bare links were unresolved. Qualify them with `Self::`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Nothing currently guards documentation: CI never runs `cargo doc`, and without a [lints.rustdoc] table a broken intra-doc link only warns. Add the lint table (deny broken_intra_doc_links + private_intra_doc_links) so `cargo doc` fails locally too, and a CI step that runs it. Use --document-private-items so links inside private modules are checked, which is where the broken links this branch fixed actually lived. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
|
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | decompress_all[("l_comment", 16)] |
19 ms | 24.7 ms | -23.08% |
| ❌ | WallTime | decompress_all[("o_comment", 16)] |
17.3 ms | 21.4 ms | -19.54% |
| ❌ | WallTime | decompress_all[("o_comment", 12)] |
17 ms | 20.1 ms | -15.41% |
| ❌ | WallTime | decompress_all[("l_comment", 12)] |
18.9 ms | 21.6 ms | -12.82% |
| ⚡ | WallTime | decompress_all[12] |
877.9 µs | 714.3 µs | +22.89% |
| ⚡ | WallTime | decompress_all[16] |
881.1 µs | 718 µs | +22.72% |
| ⚡ | WallTime | decompress_all[("p_name", 12)] |
1.4 ms | 1.2 ms | +19.34% |
| ⚡ | WallTime | decompress_all[("p_name", 16)] |
1.2 ms | 1.1 ms | +10.79% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing docs/fix-imprecisions (46e0cd1) with develop (e3f73f6)
Footnotes
-
2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No logic changes. Two parts:
Docs Cleanup
code_offsetsrationale, the read-padding bound in the spec, a stale AVX-512 mention in a test).PUBLIC_API.md, a hardcoded local path).binary-format.md→interchange-format.mdto match what it actually describes.CI