encodeBase64/decodeBase64 seems to be inefficient #5944

sigmaSd · 2024-09-11T22:46:10Z

These function use atob and btoa internally and they oom easily with large strings

Maybe instead they should use node:buffer (Buffer.from(i).toString("base64") Buffer.from(i,"base64"))

iuioiua · 2024-09-12T03:34:29Z

We wouldn't use node:buffer in order to maintain browser compatibility. Either way, any PRs that improve the performance of these functions are welcome.

FergoTheGreat · 2024-09-15T14:26:35Z

Only atob is being used, not btoa . Also, it's not being used in the btoa(String.fromCharCode(...bytes)) manner that I typically associate with call stack size issues. Do you mean to say you are actually running out of heap memory? How large is "large"?

sigmaSd · 2024-09-15T16:31:08Z

import { encodeBase64 } from "jsr:@std/encoding/base64";

encodeBase64(Deno.readFileSync(Deno.execPath()));

this crashes with out of memory error, if I use node buffer insteada it finish in a couple of milliseconds

FergoTheGreat · 2024-09-15T22:26:49Z

It seems that the memory exhaustion is likely coming from all of the string appending. I imagine that this causes a memory allocation with every new character that is appended. I have no idea if this change makes this as fast as node (probably not,) but at the very least it eliminates unnecessary memory allocations.

>>> NOT TESTED <<<

const decoder = new TextDecoder()

const base64abc = Uint8Array.from([
    65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
    80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 97, 98, 99, 100,
    101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
    113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 48, 49,
    50, 51, 52, 53, 54, 55, 56, 57, 43, 47
]);

export function encodeBase64(data: ArrayBuffer | Uint8Array | string): string {
    // CREDIT: https://gist.github.com/enepomnyaschih/72c423f727d395eeaa09697058238727
    const uint8 = validateBinaryLike(data);
    const result = new Uint8Array(Math.floor((uint8.length + 2) / 3) * 4);
    let i, j = 0;
    const l = uint8.length;
    for (i = 2; i < l; i += 3) {
        result[j++] = base64abc[(uint8[i - 2] >> 2)];
        result[j++] = base64abc[
            (((uint8[i - 2] & 0x03) << 4) | ((uint8[i - 1] >> 4)))
        ];
        result[j++] = base64abc[
            (((uint8[i - 1] & 0x0f) << 2) | ((uint8[i] >> 6)))
        ];
        result[j++] = base64abc[(uint8[i] & 0x3f)];
    }
    if (i === l + 1) {
        // 1 octet yet to write
        result[j++] = base64abc[(uint8[i - 2] >> 2)];
        result[j++] = base64abc[((uint8[i - 2] & 0x03) << 4)];
        result[j++] = 61; // '='
        result[j++] = 61; // '='
    }
    if (i === l) {
        // 2 octets yet to write
        result[j++] = base64abc[(uint8[i - 2] >> 2)];
        result[j++] = base64abc[
            (((uint8[i - 2] & 0x03) << 4) | ((uint8[i - 1] >> 4)))
        ];
        result[j++] = base64abc[((uint8[i - 1] & 0x0f) << 2)];
        result[j++] = 61; // '='
    }
    return decoder.decode(result);
}

sigmaSd · 2024-09-16T07:39:51Z

What about running the node code if we're on a server context and the current version (or an optimized one) if we detect we're on the browser

sigmaSd · 2024-09-16T13:07:38Z

Also is it ok for std to use wasm libraries, in that case even the browser branch can be made efficient

FergoTheGreat · 2024-09-17T10:50:21Z

On large buffers, it seems like Node's Buffer.toString("base64") is about 6-7 times faster than the code I provided.

kt3k · 2024-09-18T11:22:06Z

import { encodeBase64 } from "jsr:@std/encoding/base64";

encodeBase64(Deno.readFileSync(Deno.execPath()));

@FergoTheGreat Can you check that the above snippet passes with your change? If it passes, it would be far better than the current state.

sigmaSd · 2024-09-18T11:29:13Z

Presonally I think you should just use wasm for this , it's the best use case of wasm (pure math work), I imagine it's going to be faster then any optimized js version https://docs.rs/base64/latest/base64/ should work well

Also can you confirm it's not possible or a good idea to use node apis conditionally when we detect we're not on the browser?

FergoTheGreat · 2024-09-19T03:05:12Z

import { encodeBase64 } from "jsr:@std/encoding/base64";

encodeBase64(Deno.readFileSync(Deno.execPath()));
@FergoTheGreat Can you check that the above snippet passes with your change? If it passes, it would be far better than the current state.

With that change, the only limiting factor is that decoder.decode cannot produce a string with length larger than 2**29-24, meaning that it will fail for buffers > 402,653,166 bytes.

kt3k · 2024-09-19T04:37:57Z

Also can you confirm it's not possible or a good idea to use node apis conditionally when we detect we're not on the browser?

I'm not in favor of changing implementation completely based on the environment.

If I looked at the Deno implementation, atob and Buffer.from(string, "base64") are backed by the very similar ops https://github.com/denoland/deno/blob/f460188e583f00144000aa0d8ade08218d47c3c1/ext/web/lib.rs#L131-L144
I'm not sure switching to Buffer.from will boost the performance significantly.

Another point is that encoding and decoding of Base64 are planned as a language feature now (It seems already at Stage 3) https://github.com/tc39/proposal-arraybuffer-base64
Waiting for this to be added in CLI would be also an option

sigmaSd added bug Something isn't working needs triage labels Sep 11, 2024

iuioiua added encoding and removed bug Something isn't working needs triage labels Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encodeBase64/decodeBase64 seems to be inefficient #5944

encodeBase64/decodeBase64 seems to be inefficient #5944

sigmaSd commented Sep 11, 2024

iuioiua commented Sep 12, 2024

FergoTheGreat commented Sep 15, 2024

sigmaSd commented Sep 15, 2024

FergoTheGreat commented Sep 15, 2024 •

edited

Loading

sigmaSd commented Sep 16, 2024

sigmaSd commented Sep 16, 2024

FergoTheGreat commented Sep 17, 2024

kt3k commented Sep 18, 2024

sigmaSd commented Sep 18, 2024

FergoTheGreat commented Sep 19, 2024

kt3k commented Sep 19, 2024

encodeBase64/decodeBase64 seems to be inefficient #5944

encodeBase64/decodeBase64 seems to be inefficient #5944

Comments

sigmaSd commented Sep 11, 2024

iuioiua commented Sep 12, 2024

FergoTheGreat commented Sep 15, 2024

sigmaSd commented Sep 15, 2024

FergoTheGreat commented Sep 15, 2024 • edited Loading

sigmaSd commented Sep 16, 2024

sigmaSd commented Sep 16, 2024

FergoTheGreat commented Sep 17, 2024

kt3k commented Sep 18, 2024

sigmaSd commented Sep 18, 2024

FergoTheGreat commented Sep 19, 2024

kt3k commented Sep 19, 2024

FergoTheGreat commented Sep 15, 2024 •

edited

Loading