-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define the archive format #9
Comments
It's worth noting that rustdoc doesn't just want to append to an archive, but also to update files that already exist in the archive... For context: When Cargo runs a If i understand the current format correctly (note: have not done any actual reading on it) this could be as trivial as removing it from the current archive, modifying it in-memory, then saving it on the end and updating the index appropriately. But if |
A quick way to "support" this is to just append the overwritten files and
have the index point at the last version only.
…On Sat, 8 Dec 2018, 17:44 QuietMisdreavus, ***@***.***> wrote:
It's worth noting that rustdoc doesn't just want to append to an archive,
but also to update files that already exist in the archive...
For context: When Cargo runs a cargo doc command, it invokes rustdoc
multiple times on the same output directory, once for each dependency. This
allows it to update a handful of shared files - the search index, the new
source files index, the shared CSS/JS/font resources - so that the whole
dependency tree can act like a single unit. The important piece here is
that we need to be able to read in the existing search index (for example),
add in the records for the crate being documented, and save it back into
the archive.
If i understand the current format correctly (note: have not done any
actual reading on it) this could be as trivial as removing it from the
current archive, modifying it in-memory, then saving it on the end and
updating the index appropriately. But if static-filez goes to a format
where the files are going to be more interleaved, that will be more
difficult. (It sounds like that's not going to happen, but it's worth
noting.)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABOXzHHhbzJ2XrK9zJlnIP2VZG5BhAQks5u2-xYgaJpZM4ZJsm->
.
|
Why not ZIP? |
Does a zip archive allow us to get files out of it as individual gzip streams so we can send them without extracting and re-compressing? |
The format itself should allow you to get a Wrt. browser support, both Firefox and Chrome send In any case, I don't expect a documentation browser to get thousands of requests per second. |
Another thing to consider is that if you're just browsing the docs on your computer, you might as send the files to the browser without compression. And if you want to host your crate's documentation somewhere, static file hosting is probably more accessible than a VPS or something that can run code. I'm not sure what other use cases you're thinking of. Being able to serve compressed content might ultimately be a nice feature, but wouldn't really matter. |
Interesting. My main concern with this crate is making a *very* efficient
way to store and serve compressed data, and while the motivation is the use
with rustdoc ideally it doesn't end there. So, when we choose a new archive
format I wouldn't want it to have worse performance than the ad-hoc
solution we have right now; it should only add compatibility -- either with
existing applications or future versions/features of this/rustdoc.
…On Wed, 13 Feb 2019, 18:54 Laurențiu Nicola, ***@***.***> wrote:
Another thing to consider is that if you're just browsing the docs on your
computer, you might as send the files to the browser without compression.
And if you want to host your crate's documentation somewhere, static file
hosting is probably more accessible than a VPS or something that can run
code.
I'm not sure what other use cases you're thinking of. Being able to serve
compressed content might ultimately be a nice feature, but wouldn't really
matter.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABOX8wnfBZfb8671zsHscoIks52rgOMks5vNFFRgaJpZM4ZJsm->
.
|
Fair enough, there's nothing bad in wanting it to be as fast as possible. |
It would be nice to support alternative compression formats, brotli/zstd would both be useful as they compress html better than gzip. Maybe the index could record a global or per-file format, and maybe even support multiple formats to allow the server to negotiate which to serve. |
Let's define the format of out archives.
Current state
A binary file that is actually just concatenated gzip blobs.
Features:
Prior art
What I learned: GZIP members
While reading the WARC spec I found this interesting section:
I did not know this about gzip! If I'm reading this correctly, it means that we can, in theory use files compatible with tar (or WARC) with the additional requirement that each file is a new GZIP member (so that we can continue to get slices from our index file that point to valid gzip files we can serve).
Options
cc @QuietMisdreavus
The text was updated successfully, but these errors were encountered: