Skip to content

rustdoc-json: Postcard output #142642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aDotInTheVoid
Copy link
Member

r? @ghost

What

rustdoc --output-format=postcard is like rustdoc-json, but using https://postcard.rs/ / https://docs.rs/postcard/1.1.1/ instead of JSON.

Why

JSON Size and speed isn't great. People want more speed, and smaller docs. There are proposals to make the JSON smaller (and therefor faster) by making field-names shorter, and omitting them when the value is the default. But

How good is it?

In a very unscientific benchmark for aws-sdk-ec2, it's ~3.6x smaller (255MiB vs 69 MiB) and ~1.8x faster to deserialize (1.6273 s vs 914.05 ms)

What's the metaformat

  • 22 bytes of magic numbers
  • varint(u32) format version
  • Crate as usual

This way, users can look at the magic number to check it's a rustdoc-json-postcard file, then read the version number to know if they can decode it. Only then can they deserialize the Crate itself. I plan to write a library that does this, so it's easy to do well.

Why is this a draft

  • Needs more tests of CLI flag interactions
  • Currently both HtmlRenderer and JsonRenderer are configures from the same options, we should change this
  • I want to make it more principled how rustdoc before the format inspects .is_json() instead of the current hacks
  • Compiletest changes should be spun into their own thing
  • Docs

@rustbot rustbot added A-compiletest Area: The compiletest test runner A-rustdoc-json Area: Rustdoc JSON backend A-testsuite Area: The testsuite used to check the correctness of rustc T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Jun 17, 2025
@aDotInTheVoid
Copy link
Member Author

pub mod postcard {

pub type Magic = [u8; 22];
pub const MAGIC: Magic = *b"\x00\xFFRustdocJsonPostcard\xFF";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A friend points out https://hackers.town/@zwol/114155807716413069, with advice on how to design a magic number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having 'Json' in there seems perverse :)

@aDotInTheVoid
Copy link
Member Author

Something @jamesmunns pointed out is that this means that reordering fields or enum variants in rustdoc-json-types will now require a FORMAT_VERSION bump. This could probably be detected in CI using postcard-schema.

More broadly, we should think about where (if at all) postcard-schema fits into this.

@jamesmunns
Copy link
Member

As a note, I'm working on iterating on postcard and postcard-schema right now, postcard-schema-ng was just released, and is a form that might be releasable as a 1.0 soon, but I'd need to finish up the items at jamesmunns/postcard#241 to see if any additional iteration is required.

postcard-schema gets you two interesting pieces of data:

  1. postcard-schema::Key can be used to generate an 8-byte hash of the schema and the string of your choice, as a const. This can be snapshotted to detect wire changes in CI.
  2. postcard-schema::NamedType/postcard-schema-ng::DataModelType is the full reflection-style schema of the data type, which is also serializable as postcard data. This can be useful if you want the data to be archival: storing the schema inside the file itself, so you could still decode it even if the schema changes (using the postcard-dyn crate, giving you a serde_json::Value-like view of the data).

postcard is also getting a 2.0 soon, but it's important to note that the wire format is NOT changing. You will be able to use the library version v1.0 and v2.0 interchangably, wrt to serialization/deserialization (it's a breaking change because I'm removing some external crates that are now out of dates from my public API, it's likely your code won't need to change at all).

@jamesmunns
Copy link
Member

A possibly useful form for the file format could be:

struct PostcardFile<T> {
    key: Key,
    schema: Option<Schema>,
    data: T,
}

I've considered "standardizing" this format a bit, maybe with a trailing CRC32.

@aDotInTheVoid
Copy link
Member Author

postcard is also getting a 2.0 soon, but it's important to note that the wire format is NOT changing. [...]

Awesome! It'd be great to not have cobs and embedded-io in Cargo.lock (and that for all the users that care about performance).

A possibly useful form for the file format could be:

I think we definatly want to keep the magic number, so that consumers can tell if this file is rustdoc output at all, and a linear format version so they can tell if rustdoc is too old or too new for them if the schema's changed (vs a schema hash that only tells you that it's changed). Embedded the schema into the output itself is an interesting idea, I'll need to look more at it. But as long as both of these come after the magic number and linear format version, we should be fine to change them after the fact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-compiletest Area: The compiletest test runner A-rustdoc-json Area: Rustdoc JSON backend A-testsuite Area: The testsuite used to check the correctness of rustc T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants