Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary_to_float doesn't correctly handle scientific notation with no decimal places specified, such as "1e0" #9061

Open
Peter-Searby opened this issue Nov 14, 2024 · 10 comments
Assignees
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM

Comments

@Peter-Searby
Copy link

Describe the bug
The binary_to_float built in function should support numbers in scientific notation without decimal places specified.

To Reproduce
The following raises a badarg error:
binary_to_float(<<"1e0">>).

Expected behavior
1 = binary_to_float(<<"1e0">>).

Affected versions
OTP 27 and all prior versions with this function.

Additional context
Other languages, such as C++ and Golang will use this format when all decimal places are 0, so support for this format is important for cross-compatibility.

@Peter-Searby Peter-Searby added the bug Issue is reported as a bug label Nov 14, 2024
@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Nov 15, 2024
@michalmuskala
Copy link
Contributor

Native support for this format could slightly speed up json - it does support the "dotless" format and today we have to re-allocate the float string before parsing to support that

Token = <<Prefix/binary, ".0e", Suffix/binary>>,

@jhogberg jhogberg self-assigned this Nov 18, 2024
@josevalim
Copy link
Contributor

In my mind, the binary_to_float reflects the formats supported by Erlang itself, but I don't know if that's strictly true. If that's the case, would we change the Erlang grammar as well? And this may also require changes to float_to_binary:

iex(3)> :erlang.float_to_binary(10000000.0, [:short])
"1.0e7"

@jhogberg
Copy link
Contributor

Thanks for raising this issue! I've been going back and forth on this, I think it makes sense to support it (at least with an option) together with bases 2 and 16 from #9106, but I need to think a bit more about the implementation.

Currently it lands in strtod which should make us compatible with C++, but we have this awkward pre-processing step to allow both , and . as a decimal separator which rejects many things that are valid for strtod. I'm debating myself on whether to implement float conversion by ourselves to get around that.

@richcarl
Copy link
Contributor

Erlang has held on to the principle "in order to be a floating point number, it has to actually have contain a decimal point", which goes all the way back to its Prolog roots, but you can argue that floating point literals are such a common ground that they should be universally interchangeable across languages. I think it was a design mistake by the C folks to allow the dotless 1e0 version, but since it's there in C, Java, JS, etc., we'd better support it too.

Possibly we could accept the dotless form in the scanner and conversion functions, but let the compiler emit a warning if you actually use that form in code, saying something like "for readability, include the decimal point".

@richcarl
Copy link
Contributor

Currently it lands in strtod which should make us compatible with C++, but we have this awkward pre-processing step to allow both , and . as a decimal separator

Had to test this for myself, because I'd never heard about it, but it's there! But not documented, it seems? Can the commas be dropped?

And if we allow "1e0" we should also allow ".01" (perhaps not warranting a compiler warning).

@bjorng
Copy link
Contributor

bjorng commented Nov 27, 2024

Yes, I think we should drop support for commas. I didn't know they were allowed.

@jhogberg
Copy link
Contributor

There’s a wrinkle to that. If we keep using strtod we need to parse the float halfway ourselves anyway so that we can change period into comma on systems whose locale says the decimal point is a comma. This is going to be ugly no matter what we do.

I think we should consider vendoring something like https://github.com/fastfloat/fast_float to get around these issues.

@richcarl
Copy link
Contributor

There’s a wrinkle to that. If we keep using strtod we need to parse the float halfway ourselves anyway so that we can change period into comma on systems whose locale says the decimal point is a comma.

I took a quick look, and only C/C++ strtod cares about locale. Java and Javascript only accept '.'. Seems reasonable to do the same.

@jhogberg
Copy link
Contributor

There’s a wrinkle to that. If we keep using strtod we need to parse the float halfway ourselves anyway so that we can change period into comma on systems whose locale says the decimal point is a comma.

I took a quick look, and only C/C++ strtod cares about locale. Java and Javascript only accept '.'. Seems reasonable to do the same.

Indeed, my comment was that we have to handle this nonsense if our implementation lands in strtod (converting . to , if locale only accepts ,), so we should consider using something else entirely.

@brigadier
Copy link
Contributor

Yes, I think we should drop support for commas. I didn't know they were allowed.

Likely because in quite a few locales comma is the default decimal separator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

8 participants