Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@std/tar hangs indefinitely unlike @std/archive #6019

Open
lowlighter opened this issue Sep 19, 2024 · 8 comments
Open

@std/tar hangs indefinitely unlike @std/archive #6019

lowlighter opened this issue Sep 19, 2024 · 8 comments
Labels
bug Something isn't working needs triage

Comments

@lowlighter
Copy link
Contributor

Describe the bug

Given the following valid archive:

@lowlighter ➜ /tmp $ wget https://github.com/rustwasm/wasm-pack/releases/download/v0.13.0/wasm-pack-v0.13.0-x86_64-unknown-linux-musl.tar.gz
...
@lowlighter ➜ /tmp $ tar -ztvf wasm-pack-v0.13.0-x86_64-unknown-linux-musl.tar.gz 
drwxrwxrwx 0/0               0 2024-07-01 18:54 wasm-pack-v0.13.0-x86_64-unknown-linux-musl/
-rwxrwxrwx 0/0           10850 2024-07-01 18:52 wasm-pack-v0.13.0-x86_64-unknown-linux-musl/LICENSE-APACHE
-rwxrwxrwx 0/0            1059 2024-07-01 18:52 wasm-pack-v0.13.0-x86_64-unknown-linux-musl/LICENSE-MIT
-rwxrwxrwx 0/0            3383 2024-07-01 18:52 wasm-pack-v0.13.0-x86_64-unknown-linux-musl/README.md
-rwxrwxrwx 0/0         6893312 2024-07-01 18:54 wasm-pack-v0.13.0-x86_64-unknown-linux-musl/wasm-pack

Steps to Reproduce

Using new std/tar result in a code stuck (or if it isn't it's way longer than minutes):

import { UntarStream } from "jsr:@std/tar/untar-stream"
const response = await fetch("https://github.com/rustwasm/wasm-pack/releases/download/v0.13.0/wasm-pack-v0.13.0-x86_64-unknown-linux-musl.tar.gz")
const entries = response.body!
    .pipeThrough(new DecompressionStream("gzip"))
    .pipeThrough(new UntarStream())

for await (const entry of entries) {
    console.log(entry.path, entry.header.size)
}
@lowlighter ➜ /workspaces/libs/bundle (archive-to-tar) $ deno --version
deno 1.46.3 (stable, release, x86_64-unknown-linux-gnu)
v8 12.9.202.5-rusty
typescript 5.5.2
@lowlighter ➜ /workspaces/libs/bundle (archive-to-tar) $ deno run -A wasm/test.ts 
wasm-pack-v0.13.0-x86_64-unknown-linux-musl/ 0
wasm-pack-v0.13.0-x86_64-unknown-linux-musl/LICENSE-APACHE 10850
^C

Expected behavior

Using old std/archive/untar works without issues:

import { Untar } from "jsr:@std/archive/untar"
import { readerFromStreamReader } from "jsr:@std/io"

const response = await fetch("https://github.com/rustwasm/wasm-pack/releases/download/v0.13.0/wasm-pack-v0.13.0-x86_64-unknown-linux-musl.tar.gz")
const entries = new Untar(readerFromStreamReader(response.body!
    .pipeThrough(new DecompressionStream("gzip"))
    .getReader()
))

for await (const entry of entries) {
    console.log(entry.fileName, entry.fileSize)
}
@lowlighter ➜ /workspaces/libs/bundle (archive-to-tar) $ deno --version
deno 1.46.3 (stable, release, x86_64-unknown-linux-gnu)
v8 12.9.202.5-rusty
typescript 5.5.2
@lowlighter ➜ /workspaces/libs/bundle (archive-to-tar) $ deno run -A wasm/test.ts 
wasm-pack-v0.13.0-x86_64-unknown-linux-musl/ 0
wasm-pack-v0.13.0-x86_64-unknown-linux-musl/LICENSE-APACHE 10850
wasm-pack-v0.13.0-x86_64-unknown-linux-musl/LICENSE-MIT 1059
wasm-pack-v0.13.0-x86_64-unknown-linux-musl/README.md 3383
wasm-pack-v0.13.0-x86_64-unknown-linux-musl/wasm-pack 6893312

References

@lowlighter lowlighter added bug Something isn't working needs triage labels Sep 19, 2024
@lowlighter lowlighter changed the title @std/tar hands indefinitely unlike @std/archive @std/tar hangs indefinitely unlike @std/archive Sep 19, 2024
@BlackAsLight
Copy link
Contributor

The problem is that you're not handling the entry.readable within the for loop. If .readable exists on entry then you need to either consume it entirely or .cancel() it before the next entry in the for loop will resolve. image

@lionel-rowe
Copy link
Contributor

Might be useful to give TarStreamEntry a Symbol.dispose method so you can do this (pending a fix for this Deno bug):

for await (using entry of entries) {
    console.log(entry.path, entry.header.size)
}

@BlackAsLight
Copy link
Contributor

Might be useful to give TarStreamEntry a Symbol.dispose method so you can do this (pending a fix for this Deno bug):

for await (using entry of entries) {

    console.log(entry.path, entry.header.size)

}

Maybe. That user error bug would then be somebody trying to consume the readable stream, not awaiting its consumption in the scope and getting thrown a bad resource ID.

@lowlighter
Copy link
Contributor Author

Is blocking the next entry until previous readable is either discarded or consumed a limitation or is it by design ?

If the latter I feel like maybe it is slightly inconvenient.
Beyond the fact that the API is changed so it's less of a drop-in solution, it prevents some parallelization patterns like Array.fromAsync().

In any case it's easily missable from the docs, especially if you're the one reviewing code from someone else, you're probably going to miss this specificity

@BlackAsLight
Copy link
Contributor

Is blocking the next entry until previous readable is either discarded or consumed a limitation or is it by design ?

It's a limitation of tar. Each file in tar is appended one after the other. Tar isn't designed to be extracted in parallel, but one after the other. A parallel version could be made specifically only for Deno but that would require the file to be saved to disk and not compressed, so it could then seek around the file. But I don't know many people who would be writing non-compressed tar files to disk.

Streams are also designed to work linearly, meaning one must work with the data from left to right. Streams are great for working linearly over large amounts of data while keeping a small memory footprint.

@BlackAsLight
Copy link
Contributor

Might be useful to give TarStreamEntry a Symbol.dispose method so you can do this (pending a fix for this Deno bug):

for await (using entry of entries) {
    console.log(entry.path, entry.header.size)
}

Going back to this idea on implementing Symbol.asyncDispose for TarStreamEntry.

We could implement it to solve the above issue of hanging indefinitely when entry?.readable isn't handled within the scope.

Some considerations to take into account though is if the user did this, forgetting to await the pipping, the code would then throw a bad resource ID as the scope ends and the readable tries to continue. This may be more desirable behaviour than hanging indefinitely.

for await (using entry of readable) {
  console.log(entry.path, entry.header.size)
  entry?.readable.pipeTo((await Deno.create(entry.path)).writable)
}

Another consideration to take into account though is if the user is properly handling the entry?.readable, then the using keyword is completely reductant and using const would end up saving a few checks.

for await (const entry of readable) {
  console.log(entry.path, entry.header.size)
  if (condition) await entry?.readable.pipeTo((await Deno.create(path)).writable)
  else await entry?.readable.cancel()
}

When using const in this second snippet, forgetting the await keyword is perfectly acceptable.

@lionel-rowe
Copy link
Contributor

lionel-rowe commented Sep 24, 2024

Oh yeah, I guess it should be Symbol.asyncDispose as readable.cancel is async.

With Symbol.asyncDispose, the correct syntax is for await (await using entry of readable) rather than for await (using entry of readable)... however, in practice I don't think it will make a difference, as the loop iteration is only suspended until cancel resolves. Not sure if that could cause any footguns.

Edit: nvm, using won't call Symbol.asyncDispose, only Symbol.dispose. So for await (await using is necessary unless overloading Symbol.dispose to return a promise (which seems like a bad idea).

@BlackAsLight
Copy link
Contributor

however, in practice I don't think it will make a difference, as the loop iteration is only suspended until cancel resolves. Not sure if that could cause any footguns.

Thinking about it, in the example where forgetting to await the pipping, I think it may be more likely that the stream becomes corrupt as some of it is deleted through .cancel() and other parts is pipped, resulting in the pipped data missing chunks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

No branches or pull requests

3 participants