feat: Add encoding option for binary output (#20)#56
Conversation
3add330 to
7af92ba
Compare
ide
left a comment
There was a problem hiding this comment.
@kitten I worked on this API to add binary support and there are a couple decisions I'd like to get your thoughts on.
stdout: string + stdoutBytes: Uint8Array vs. stdout: string | Uint8Array. I opted to change which fields get defined in the output based on the specified encoding, rather than change the type of the stdout and stderr fields. This is because it makes the caller's code clearer and more greppable for a reader to see whether a string or byte array is being passed around.
Rejecting the promise vs throwing when lazily accessing stdout/stderr. The reason I removed the lazy accessors is to reduce memory usage in common cases. The peak memory usage is still 2x (chunks + the final output) but we free the chunks immediately after. On the other hand, it could be annoying to have spawnAsync reject just because of the amount of output, when the maxBuffer parameter is really intended as a safe guard.
|
@ide: I think, I'd still prefer the overload to be honest, since, from a clarity standpoint, we wouldn't expect If we assume that the most common case is the string output case, then I think that's basically acceptable, and aligning with Node would be (imo) preferable over the difference in calls (I'd say, if we do split them, it'd almost worth separating this into a separate export entirely, but if we have an overload, I'd reuse the property names)
The reason I added the lazy rejection is basically for safe-guarding old calls too (not quite backwards compatibility but in the same spirit). The main motivation was to ensure that:
I think V8 has special handling of It's possible that in the tl;dr: I wasn't too concerned about total memory usage (peak RSS) since it'd be very temporary, but about overall memory pressure with small string allocations, which wouldn't apply to |
|
Types: thanks for being a sounding board. Let's go with the option that follows Node's convention. I don't think it's as good of an API (IMO plain greppability with fewer overloads is valuable) but this is also not necessarily a place we want to spend our "creativity budget". On the point about concatenation and GC costs: this PR addresses this concern by concatenating only when the child process completes. It's not as lazy as a getter, but it also doesn't build up a string until the very end. I'm not so worried about backwards compatibility because Node would have crashed if spawnAsync read in over 512MB of text and I can't imagine anyone relying on that behavior. To me, the main question is the API ergonomics when |
7af92ba to
796a617
Compare
c9e012f to
71897db
Compare
Why
===
`spawnAsync` always decodes child output with `toString('utf8')`, which
corrupts binary output, e.g. `pandoc` writing a `.docx` to stdout.
(Closes #20.)
How
===
Add an `encoding` option to `SpawnOptions`. `encoding: 'buffer'` returns
`stdout` / `stderr` / `output` as `Uint8Array`; the default `'utf8'` and
other `BufferEncoding` values are unchanged. `SpawnResult<T = string>`
is now parameterized on the stdio element type and overloads select
between the string and buffer forms; existing `SpawnResult` references
resolve to `SpawnResult<string>` with no source changes.
The result is built once when the process exits, freeing the
intermediate chunks immediately instead of retaining them behind lazy
getters for the result's lifetime. Exceeding the buffer cap rejects in
all cases: an explicit `maxBuffer` already rejected, and the default
cap previously resolved then threw lazily on property access.
A `maxBuffer` larger than the encoding's runtime hard limit
(`MAX_STRING_LENGTH` for text, `MAX_LENGTH` for `'buffer'`) now throws
`TypeError` synchronously at the call site. Previously it was silently
clamped, which led to confusing rejection messages later.
Test Plan
=========
Unit tests; the existing suite passes unchanged. New tests:
- returns stdout/stderr as Uint8Array under encoding: 'buffer'
- survives a non-UTF-8 byte sequence: bytes preserved, not replaced
- populates output as [stdout, stderr] of Uint8Array in buffer mode
- attaches bytes to the error on non-zero exit, like string stdout
- enforces maxBuffer under encoding: 'buffer'
- decodes stdout with latin1, and with hex
- rejects suggesting the string-length / byte-array limit when the
default cap is exceeded
- throws TypeError synchronously when maxBuffer exceeds the encoding's
hard limit; accepts a maxBuffer exactly equal to it
71897db to
56b6ebf
Compare
|
@kitten three key behaviors now implemented, could you sanity check them?
|
Why
spawnAsyncalways decodes child output withtoString('utf8'), which corrupts binary output, e.g.pandocwriting a.docxto stdout. (Closes #20.)How
Add an
encodingoption toSpawnOptions.encoding: 'buffer'returnsstdout/stderr/outputasUint8Array; the default'utf8'and otherBufferEncodingvalues are unchanged.SpawnResult<T = string>is now parameterized on the stdio element type and overloads select between the string and buffer forms; existingSpawnResultreferences resolve toSpawnResult<string>with no source changes.The result is built once when the process exits, freeing the intermediate chunks immediately instead of retaining them behind lazy getters for the result's lifetime. Exceeding the buffer cap rejects in all cases: an explicit
maxBufferalready rejected, and the default cap previously resolved then threw lazily on property access.A
maxBufferlarger than the encoding's runtime hard limit (MAX_STRING_LENGTHfor text,MAX_LENGTHfor'buffer') now throwsTypeErrorsynchronously at the call site. Previously it was silently clamped, which led to confusing rejection messages later.Behavior changes from 1.8.0
maxBufferexceededERR_CHILD_PROCESS_STDIO_MAXBUFFER; truncated stdio attachedMAX_STRING_LENGTH) exceededresult.stdout/result.stderrlazily throwsERR_CHILD_PROCESS_STDIO_MAXBUFFERmaxBufferlarger than the encoding's hard limitMath.minTypeErrorat the call siteTest Plan
Unit tests; the existing suite passes unchanged. New tests: