std.math.big.int: Support strings up to base 36 #22461

linusg · 2025-01-11T00:47:24Z

lib/std/math/big/int.zig

ifreund

This LGTM, Thanks! I'll rebase this branch for you and set it to auto merge when CI passes

linusg · 2025-01-17T19:49:45Z

It looks like calcSetStringLimbCount() does not work with bases up to 36 in 32 bit builds (limb_bits is derived from usize) - I only tested a 64 bit build locally and didn't notice. Not sure how to fix that easily tbh.

ifreund · 2025-01-17T21:03:03Z

It looks like calcSetStringLimbCount() does not work with bases up to 36 in 32 bit builds (limb_bits is derived from usize) - I only tested a 64 bit build locally and didn't notice. Not sure how to fix that easily tbh.

Ah indeed, I should have read the CI failures more carefully rather than assuming they were likely outdated transient failures that would be solved by a rebase.

I don't see a trivial solution to that issue either, solving it properly seems like it would require some clear thinking about math that I don't want to get into right now.

We could restrict the base to @min(limb_bits, 36) and document this limitation for now if that unblocks your use-case.

alexrp · 2025-02-12T10:12:59Z

Just for triage purposes, what's the plan for this one?

linusg · 2025-02-12T10:27:11Z

Not sure. Multiple options, in preferred order:

Someone who actually understands the math tells me how to update calcSetStringLimbCount()
We implement the workaround of using arch-dependent limits
We keep the status quo

Something else that came up while investigating this: the current implementation of calcSetStringLimbCount() overshoots by as much as 4x, whoever first implemented that got it wrong too. This part of std is in need of someone who can give it the necessary attention.

linusg · 2025-02-12T11:14:51Z

Another possible option would be to change std.math.big.Limb from usize to u64 but I'm not sure if that would have any negative implications for bigints on 32-bit platforms.

samy-00007 · 2025-02-21T14:16:21Z

The correct formula for calcSetStringLimbCount should be $1 + \Bigg \lfloor \frac{\text{string-len}}{\text{@bitSizeOf}(\text{Limb})} \cdot \log_{2}(base) \Bigg \rfloor$ (i can provide the proof if needed).

However, only using integers (like in sizeInBaseUpperBound) also leads to overshoot. For instance, when trying to use toString on a 1000 Limb BigInt, sizeInBaseUpperBound overshoots by 2069 bytes.

I think these functions should work:

fn sizeInBaseUpperBound(self: Const, base: u8) usize {
    const sign_byte = @as(u1, @intFromBool(!self.positive));
    const bit_count_abs: f32 = @floatFromInt(self.bitCountAbs());
    const base_f: f32 = @floatFromInt(base);
    return @as(usize, @intFromFloat(bit_count_abs / std.math.log2(base_f))) + 1 + sign_byte;
}

// assumes `string_len` doesn't account for minus signs if the number is negative
fn calcSetStringLimbCount(string_len: usize, base: u8) usize {
    const base_f: f32 = @intFromFloat(base);
    const string_len_f: f32 = @intFromFloat(string_len);
    return 1 + @as(usize, @intFromFloat(string_len_f * std.math.log2(base_f) / @bitSizeOf(Limb)));
}

linusg · 2025-02-21T20:22:19Z

Thanks for the help! That seems to be off by one:

zig test math/big/int_test.zig --zig-lib-dir '/home/linus/Dev/zig/lib'
thread 21245 panic: index out of bounds: index 25, len 24
/home/linus/Dev/zig/lib/std/math/big/int.zig:691:26: 0x1100554 in mulNoAlias (test)
        @memset(rma.limbs[0 .. a.limbs.len + b.limbs.len], 0);
                         ^
/home/linus/Dev/zig/lib/std/math/big/int.zig:664:30: 0x10ffca7 in mul (test)
        return rma.mulNoAlias(a_copy, b_copy, allocator);
                             ^
/home/linus/Dev/zig/lib/std/math/big/int.zig:322:21: 0x10ff3a4 in setString (test)
            self.mul(self.toConst(), ap_base, limbs_buffer, allocator);
                    ^
/home/linus/Dev/zig/lib/std/math/big/int.zig:2830:24: 0x10fee1f in setString (test)
        try m.setString(base, value, limbs_buffer, self.allocator);
                       ^
/home/linus/Dev/zig/lib/std/math/big/int_test.zig:1428:20: 0x1133dcd in test.divFloor #10932 (test)
    try a.setString(10, "40000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000");
                   ^
/home/linus/Dev/zig/lib/compiler/test_runner.zig:209:25: 0x11a0e25 in mainTerminal (test)
        if (test_fn.func()) |_| {
                        ^
/home/linus/Dev/zig/lib/compiler/test_runner.zig:57:28: 0x119af42 in main (test)
        return mainTerminal();
                           ^
/home/linus/Dev/zig/lib/std/start.zig:647:22: 0x119a4d2 in posixCallMainAndExit (test)
            root.main();
                     ^
/home/linus/Dev/zig/lib/std/start.zig:271:5: 0x119a0ad in _start (test)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)

samy-00007 · 2025-02-21T22:11:18Z

I have reproduced the issue, my function does not undershoot by one, but the multiply requires a temporary buffer which can be 1 larger than needed (to prevent overflows, but it doesn't happen in this case).
I guess the solution is to manually add some safety constant (like 2 or 3) to the result of calcSetStringLimbCount. I think it could also help reduce the risk of a potential problems due to imprecision of floats (in case the function were to actually slightly undershoot). This safety constant should also be added to sizeInBaseUpperBound for the same reasons.

Co-Authored-By: samy007 <[email protected]>

linusg · 2025-02-22T10:16:10Z

A @ceil() has fixed it without the magic offset. Should be fine for now, we can improve it further later if needed. I've added you as a co-author to the commit :)

andrewrk · 2025-02-22T21:30:15Z

error: error.Unexpected NTSTATUS=0xc0000243
C:\Users\CI\actions-runner-8\_work\zig\zig\lib\std\posix.zig:1059:52: 0x7ff74ed87439 in ftruncate (zig.exe.obj)
            else => return windows.unexpectedStatus(rc),
                                                   ^
C:\Users\CI\actions-runner-8\_work\zig\zig\lib\std\fs\File.zig:323:24: 0x7ff74ebdbbc9 in setEndPos (zig.exe.obj)
    try posix.ftruncate(self.handle, length);
                       ^

let's not miss the opportunity to add the additional error code to windows setEndPos implementation

linusg · 2025-02-23T01:05:39Z

-> #22981

linusg · 2025-02-23T10:10:27Z

Merging this given @ifreund's earlier approval.

tauoverpi reviewed Jan 14, 2025

View reviewed changes

lib/std/math/big/int.zig Show resolved Hide resolved

ifreund approved these changes Jan 17, 2025

View reviewed changes

ifreund force-pushed the bigint-base-36 branch from 4892401 to 1bd05a7 Compare January 17, 2025 19:49

std.math.big.int: Support strings up to base 36

301343b

Co-Authored-By: samy007 <[email protected]>

linusg force-pushed the bigint-base-36 branch from 1bd05a7 to 301343b Compare February 22, 2025 10:15

linusg mentioned this pull request Feb 23, 2025

std.posix: Handle USER_MAPPED_FILE in windows ftruncate() impl #22981

Merged

linusg enabled auto-merge (rebase) February 23, 2025 10:10

linusg merged commit c44f450 into ziglang:master Feb 23, 2025
9 checks passed

linusg deleted the bigint-base-36 branch February 23, 2025 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std.math.big.int: Support strings up to base 36 #22461

std.math.big.int: Support strings up to base 36 #22461

linusg commented Jan 11, 2025

ifreund left a comment

linusg commented Jan 17, 2025

ifreund commented Jan 17, 2025

alexrp commented Feb 12, 2025

linusg commented Feb 12, 2025

linusg commented Feb 12, 2025

samy-00007 commented Feb 21, 2025 •

edited

Loading

linusg commented Feb 21, 2025

samy-00007 commented Feb 21, 2025 •

edited

Loading

linusg commented Feb 22, 2025

andrewrk commented Feb 22, 2025

linusg commented Feb 23, 2025

linusg commented Feb 23, 2025

std.math.big.int: Support strings up to base 36 #22461

std.math.big.int: Support strings up to base 36 #22461

Conversation

linusg commented Jan 11, 2025

ifreund left a comment

Choose a reason for hiding this comment

linusg commented Jan 17, 2025

ifreund commented Jan 17, 2025

alexrp commented Feb 12, 2025

linusg commented Feb 12, 2025

linusg commented Feb 12, 2025

samy-00007 commented Feb 21, 2025 • edited Loading

linusg commented Feb 21, 2025

samy-00007 commented Feb 21, 2025 • edited Loading

linusg commented Feb 22, 2025

andrewrk commented Feb 22, 2025

linusg commented Feb 23, 2025

linusg commented Feb 23, 2025

samy-00007 commented Feb 21, 2025 •

edited

Loading

samy-00007 commented Feb 21, 2025 •

edited

Loading