feat: Add `encoding` option for binary output (#20) by ide-agent · Pull Request #56 · expo/spawn-async

ide-agent · 2026-05-21T23:08:47Z

Why

spawnAsync always decodes child output with toString('utf8'), which corrupts binary output, e.g. pandoc writing a .docx to stdout. (Closes #20.)

How

Add an encoding option to SpawnOptions. encoding: 'buffer' returns stdout / stderr / output as Uint8Array; the default 'utf8' and other BufferEncoding values are unchanged. SpawnResult<T = string> is now parameterized on the stdio element type and overloads select between the string and buffer forms; existing SpawnResult references resolve to SpawnResult<string> with no source changes.

The result is built once when the process exits, freeing the intermediate chunks immediately instead of retaining them behind lazy getters for the result's lifetime. Exceeding the buffer cap rejects in all cases: an explicit maxBuffer already rejected, and the default cap previously resolved then threw lazily on property access.

A maxBuffer larger than the encoding's runtime hard limit (MAX_STRING_LENGTH for text, MAX_LENGTH for 'buffer') now throws TypeError synchronously at the call site. Previously it was silently clamped, which led to confusing rejection messages later.

Behavior changes from 1.8.0

scenario	1.8.0	this PR
Process exits 0, output under cap	resolves with full output	same
Process exits non-zero (any output size)	rejects; truncated stdio attached to the error	same
Explicit `maxBuffer` exceeded	rejects with `ERR_CHILD_PROCESS_STDIO_MAXBUFFER`; truncated stdio attached	same
Default cap (`MAX_STRING_LENGTH`) exceeded	resolves; reading `result.stdout` / `result.stderr` lazily throws `ERR_CHILD_PROCESS_STDIO_MAXBUFFER`	rejects with the same code; truncated stdio attached
`maxBuffer` larger than the encoding's hard limit	silently clamped via `Math.min`	synchronous `TypeError` at the call site

Test Plan

Unit tests; the existing suite passes unchanged. New tests:

returns stdout/stderr as Uint8Array under encoding: 'buffer'
survives a non-UTF-8 byte sequence: bytes preserved, not replaced
populates output as [stdout, stderr] of Uint8Array in buffer mode
attaches bytes to the error on non-zero exit, like string stdout
enforces maxBuffer under encoding: 'buffer'
decodes stdout with latin1, and with hex
rejects suggesting the string-length limit when text default cap is exceeded
rejects suggesting a larger maxBuffer when bytes default cap is exceeded
throws TypeError synchronously when maxBuffer exceeds the encoding's hard limit; accepts a maxBuffer exactly equal to it

ide

@kitten I worked on this API to add binary support and there are a couple decisions I'd like to get your thoughts on.

stdout: string + stdoutBytes: Uint8Array vs. stdout: string | Uint8Array. I opted to change which fields get defined in the output based on the specified encoding, rather than change the type of the stdout and stderr fields. This is because it makes the caller's code clearer and more greppable for a reader to see whether a string or byte array is being passed around.

Rejecting the promise vs throwing when lazily accessing stdout/stderr. The reason I removed the lazy accessors is to reduce memory usage in common cases. The peak memory usage is still 2x (chunks + the final output) but we free the chunks immediately after. On the other hand, it could be annoying to have spawnAsync reject just because of the amount of output, when the maxBuffer parameter is really intended as a safe guard.

kitten · 2026-05-22T18:10:22Z

@ide: I think, I'd still prefer the overload to be honest, since, from a clarity standpoint, we wouldn't expect encoding and the result types to be too far away from each other in most cases. The typing is part of the clarity here, and Node does this quite frequently too.

If we assume that the most common case is the string output case, then I think that's basically acceptable, and aligning with Node would be (imo) preferable over the difference in calls (I'd say, if we do split them, it'd almost worth separating this into a separate export entirely, but if we have an overload, I'd reuse the property names)

On the other hand, it could be annoying to have spawnAsync reject just because of the amount of output, when the maxBuffer parameter is really intended as a safe guard.

The reason I added the lazy rejection is basically for safe-guarding old calls too (not quite backwards compatibility but in the same spirit). The main motivation was to ensure that:

we only hold on to the raw chunks in memory
we concat once, to avoid lots of GC work (the string concat was implicitly pretty expensive, if you consider GC too)
we sometimes don't even use stdout or stderr, so can clearly let the buffers be freed

I think V8 has special handling of ArrayBuffer memory, so the main thing I wanted to ensure was that the concat is done once (or not at all, when it's not required), and that we don't convert to a string too early, to avoid the small per-chunk strings from being allocated. (Likely more predictable in terms of GC load)

It's possible that in the encoding: 'buffer' mode, small concats are fine, but I didn't immediately test the difference. It's possible we'd want to do something cleverer for that case and maybe even eagerly concat. I haven't benchmarked this though, so it's just a suspicion that different optimisations would apply in that case

tl;dr: I wasn't too concerned about total memory usage (peak RSS) since it'd be very temporary, but about overall memory pressure with small string allocations, which wouldn't apply to encoding: 'buffer'

ide · 2026-05-22T21:56:36Z

Types: thanks for being a sounding board. Let's go with the option that follows Node's convention. I don't think it's as good of an API (IMO plain greppability with fewer overloads is valuable) but this is also not necessarily a place we want to spend our "creativity budget".

On the point about concatenation and GC costs: this PR addresses this concern by concatenating only when the child process completes. It's not as lazy as a getter, but it also doesn't build up a string until the very end.

I'm not so worried about backwards compatibility because Node would have crashed if spawnAsync read in over 512MB of text and I can't imagine anyone relying on that behavior. To me, the main question is the API ergonomics when maxBuffer is a lower number that the developer expects to cross.

Why === `spawnAsync` always decodes child output with `toString('utf8')`, which corrupts binary output, e.g. `pandoc` writing a `.docx` to stdout. (Closes #20.) How === Add an `encoding` option to `SpawnOptions`. `encoding: 'buffer'` returns `stdout` / `stderr` / `output` as `Uint8Array`; the default `'utf8'` and other `BufferEncoding` values are unchanged. `SpawnResult<T = string>` is now parameterized on the stdio element type and overloads select between the string and buffer forms; existing `SpawnResult` references resolve to `SpawnResult<string>` with no source changes. The result is built once when the process exits, freeing the intermediate chunks immediately instead of retaining them behind lazy getters for the result's lifetime. Exceeding the buffer cap rejects in all cases: an explicit `maxBuffer` already rejected, and the default cap previously resolved then threw lazily on property access. A `maxBuffer` larger than the encoding's runtime hard limit (`MAX_STRING_LENGTH` for text, `MAX_LENGTH` for `'buffer'`) now throws `TypeError` synchronously at the call site. Previously it was silently clamped, which led to confusing rejection messages later. Test Plan ========= Unit tests; the existing suite passes unchanged. New tests: - returns stdout/stderr as Uint8Array under encoding: 'buffer' - survives a non-UTF-8 byte sequence: bytes preserved, not replaced - populates output as [stdout, stderr] of Uint8Array in buffer mode - attaches bytes to the error on non-zero exit, like string stdout - enforces maxBuffer under encoding: 'buffer' - decodes stdout with latin1, and with hex - rejects suggesting the string-length / byte-array limit when the default cap is exceeded - throws TypeError synchronously when maxBuffer exceeds the encoding's hard limit; accepts a maxBuffer exactly equal to it

ide · 2026-05-23T06:47:50Z

@kitten three key behaviors now implemented, could you sanity check them?

stdout/stderr have polymorphic types, specifically they can be strings or buffers depending on the chosen encoding
the default maxBuffer is MAX_STRING_LENGTH even for the "buffer" encoding (because MAX_LENGTH for bytes is MAX_SAFE_INTEGER, not very useful)
the promise rejects after the child process exits if the stdout or stderr exceed maxBuffer, regardless of whether maxBuffer was specified or is the implicit default (table in the PR description shows the behavior)

ide-agent requested a review from ide May 21, 2026 23:08

ide-agent force-pushed the worktree-issue-20-encoding-buffer branch 2 times, most recently from 3add330 to 7af92ba Compare May 21, 2026 23:48

ide requested a review from kitten May 21, 2026 23:52

ide reviewed May 22, 2026

View reviewed changes

kitten reviewed May 22, 2026

View reviewed changes

Comment thread src/spawnAsync.ts Outdated

kitten reviewed May 22, 2026

View reviewed changes

Comment thread src/spawnAsync.ts Outdated

ide-agent force-pushed the worktree-issue-20-encoding-buffer branch from 7af92ba to 796a617 Compare May 22, 2026 23:26

ide self-requested a review May 22, 2026 23:27

ide-agent force-pushed the worktree-issue-20-encoding-buffer branch 5 times, most recently from c9e012f to 71897db Compare May 23, 2026 06:25

ide-agent force-pushed the worktree-issue-20-encoding-buffer branch from 71897db to 56b6ebf Compare May 23, 2026 06:26

ide approved these changes May 23, 2026

View reviewed changes

ide requested a review from kitten May 23, 2026 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `encoding` option for binary output (#20)#56

feat: Add `encoding` option for binary output (#20)#56
ide-agent wants to merge 1 commit into
mainfrom
worktree-issue-20-encoding-buffer

ide-agent commented May 21, 2026 •

edited

Loading

Uh oh!

ide left a comment

Uh oh!

Uh oh!

Uh oh!

kitten commented May 22, 2026

Uh oh!

ide commented May 22, 2026

Uh oh!

ide commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ide-agent commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

How

Behavior changes from 1.8.0

Test Plan

Uh oh!

ide left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kitten commented May 22, 2026

Uh oh!

ide commented May 22, 2026

Uh oh!

ide commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ide-agent commented May 21, 2026 •

edited

Loading