Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.math.big.int: Support strings up to base 36 #22461

Merged
merged 1 commit into from
Feb 23, 2025

Conversation

linusg
Copy link
Collaborator

@linusg linusg commented Jan 11, 2025

Closes #22206.

Copy link
Member

@ifreund ifreund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, Thanks! I'll rebase this branch for you and set it to auto merge when CI passes

@linusg
Copy link
Collaborator Author

linusg commented Jan 17, 2025

It looks like calcSetStringLimbCount() does not work with bases up to 36 in 32 bit builds (limb_bits is derived from usize) - I only tested a 64 bit build locally and didn't notice. Not sure how to fix that easily tbh.

@ifreund
Copy link
Member

ifreund commented Jan 17, 2025

It looks like calcSetStringLimbCount() does not work with bases up to 36 in 32 bit builds (limb_bits is derived from usize) - I only tested a 64 bit build locally and didn't notice. Not sure how to fix that easily tbh.

Ah indeed, I should have read the CI failures more carefully rather than assuming they were likely outdated transient failures that would be solved by a rebase.

I don't see a trivial solution to that issue either, solving it properly seems like it would require some clear thinking about math that I don't want to get into right now.

We could restrict the base to @min(limb_bits, 36) and document this limitation for now if that unblocks your use-case.

@alexrp
Copy link
Member

alexrp commented Feb 12, 2025

Just for triage purposes, what's the plan for this one?

@linusg
Copy link
Collaborator Author

linusg commented Feb 12, 2025

Not sure. Multiple options, in preferred order:

  • Someone who actually understands the math tells me how to update calcSetStringLimbCount()
  • We implement the workaround of using arch-dependent limits
  • We keep the status quo

Something else that came up while investigating this: the current implementation of calcSetStringLimbCount() overshoots by as much as 4x, whoever first implemented that got it wrong too. This part of std is in need of someone who can give it the necessary attention.

@linusg
Copy link
Collaborator Author

linusg commented Feb 12, 2025

Another possible option would be to change std.math.big.Limb from usize to u64 but I'm not sure if that would have any negative implications for bigints on 32-bit platforms.

@samy-00007
Copy link
Contributor

samy-00007 commented Feb 21, 2025

The correct formula for calcSetStringLimbCount should be $1 + \Bigg \lfloor \frac{\text{string-len}}{\text{@bitSizeOf}(\text{Limb})} \cdot \log_{2}(base) \Bigg \rfloor$ (i can provide the proof if needed).

However, only using integers (like in sizeInBaseUpperBound) also leads to overshoot. For instance, when trying to use toString on a 1000 Limb BigInt, sizeInBaseUpperBound overshoots by 2069 bytes.

I think these functions should work:

fn sizeInBaseUpperBound(self: Const, base: u8) usize {
    const sign_byte = @as(u1, @intFromBool(!self.positive));
    const bit_count_abs: f32 = @floatFromInt(self.bitCountAbs());
    const base_f: f32 = @floatFromInt(base);
    return @as(usize, @intFromFloat(bit_count_abs / std.math.log2(base_f))) + 1 + sign_byte;
}

// assumes `string_len` doesn't account for minus signs if the number is negative
fn calcSetStringLimbCount(string_len: usize, base: u8) usize {
    const base_f: f32 = @intFromFloat(base);
    const string_len_f: f32 = @intFromFloat(string_len);
    return 1 + @as(usize, @intFromFloat(string_len_f * std.math.log2(base_f) / @bitSizeOf(Limb)));
}

@linusg
Copy link
Collaborator Author

linusg commented Feb 21, 2025

Thanks for the help! That seems to be off by one:

zig test math/big/int_test.zig --zig-lib-dir '/home/linus/Dev/zig/lib'
thread 21245 panic: index out of bounds: index 25, len 24
/home/linus/Dev/zig/lib/std/math/big/int.zig:691:26: 0x1100554 in mulNoAlias (test)
        @memset(rma.limbs[0 .. a.limbs.len + b.limbs.len], 0);
                         ^
/home/linus/Dev/zig/lib/std/math/big/int.zig:664:30: 0x10ffca7 in mul (test)
        return rma.mulNoAlias(a_copy, b_copy, allocator);
                             ^
/home/linus/Dev/zig/lib/std/math/big/int.zig:322:21: 0x10ff3a4 in setString (test)
            self.mul(self.toConst(), ap_base, limbs_buffer, allocator);
                    ^
/home/linus/Dev/zig/lib/std/math/big/int.zig:2830:24: 0x10fee1f in setString (test)
        try m.setString(base, value, limbs_buffer, self.allocator);
                       ^
/home/linus/Dev/zig/lib/std/math/big/int_test.zig:1428:20: 0x1133dcd in test.divFloor #10932 (test)
    try a.setString(10, "40000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000");
                   ^
/home/linus/Dev/zig/lib/compiler/test_runner.zig:209:25: 0x11a0e25 in mainTerminal (test)
        if (test_fn.func()) |_| {
                        ^
/home/linus/Dev/zig/lib/compiler/test_runner.zig:57:28: 0x119af42 in main (test)
        return mainTerminal();
                           ^
/home/linus/Dev/zig/lib/std/start.zig:647:22: 0x119a4d2 in posixCallMainAndExit (test)
            root.main();
                     ^
/home/linus/Dev/zig/lib/std/start.zig:271:5: 0x119a0ad in _start (test)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)

@samy-00007
Copy link
Contributor

samy-00007 commented Feb 21, 2025

I have reproduced the issue, my function does not undershoot by one, but the multiply requires a temporary buffer which can be 1 larger than needed (to prevent overflows, but it doesn't happen in this case).
I guess the solution is to manually add some safety constant (like 2 or 3) to the result of calcSetStringLimbCount. I think it could also help reduce the risk of a potential problems due to imprecision of floats (in case the function were to actually slightly undershoot). This safety constant should also be added to sizeInBaseUpperBound for the same reasons.

@linusg
Copy link
Collaborator Author

linusg commented Feb 22, 2025

A @ceil() has fixed it without the magic offset. Should be fine for now, we can improve it further later if needed. I've added you as a co-author to the commit :)

@andrewrk
Copy link
Member

error: error.Unexpected NTSTATUS=0xc0000243
C:\Users\CI\actions-runner-8\_work\zig\zig\lib\std\posix.zig:1059:52: 0x7ff74ed87439 in ftruncate (zig.exe.obj)
            else => return windows.unexpectedStatus(rc),
                                                   ^
C:\Users\CI\actions-runner-8\_work\zig\zig\lib\std\fs\File.zig:323:24: 0x7ff74ebdbbc9 in setEndPos (zig.exe.obj)
    try posix.ftruncate(self.handle, length);
                       ^

let's not miss the opportunity to add the additional error code to windows setEndPos implementation

@linusg
Copy link
Collaborator Author

linusg commented Feb 23, 2025

-> #22981

@linusg
Copy link
Collaborator Author

linusg commented Feb 23, 2025

Merging this given @ifreund's earlier approval.

@linusg linusg enabled auto-merge (rebase) February 23, 2025 10:10
@linusg linusg merged commit c44f450 into ziglang:master Feb 23, 2025
9 checks passed
@linusg linusg deleted the bigint-base-36 branch February 23, 2025 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support base 2-36 in bigint toString() functions
6 participants