binary_to_float doesn't correctly handle scientific notation with no decimal places specified, such as "1e0" #9061

Peter-Searby · 2024-11-14T16:22:54Z

Describe the bug
The binary_to_float built in function should support numbers in scientific notation without decimal places specified.

To Reproduce
The following raises a badarg error:
binary_to_float(<<"1e0">>).

Expected behavior
1 = binary_to_float(<<"1e0">>).

Affected versions
OTP 27 and all prior versions with this function.

Additional context
Other languages, such as C++ and Golang will use this format when all decimal places are 0, so support for this format is important for cross-compatibility.

The text was updated successfully, but these errors were encountered:

michalmuskala · 2024-11-15T19:52:29Z

Native support for this format could slightly speed up json - it does support the "dotless" format and today we have to re-allocate the float string before parsing to support that

otp/lib/stdlib/src/json.erl

Line 1167 in 2b23f5b

Token = <<Prefix/binary, ".0e", Suffix/binary>>,

josevalim · 2024-11-25T19:29:51Z

In my mind, the binary_to_float reflects the formats supported by Erlang itself, but I don't know if that's strictly true. If that's the case, would we change the Erlang grammar as well? And this may also require changes to float_to_binary:

iex(3)> :erlang.float_to_binary(10000000.0, [:short])
"1.0e7"

jhogberg · 2024-11-26T07:49:44Z

Thanks for raising this issue! I've been going back and forth on this, I think it makes sense to support it (at least with an option) together with bases 2 and 16 from #9106, but I need to think a bit more about the implementation.

Currently it lands in strtod which should make us compatible with C++, but we have this awkward pre-processing step to allow both , and . as a decimal separator which rejects many things that are valid for strtod. I'm debating myself on whether to implement float conversion by ourselves to get around that.

richcarl · 2024-11-26T10:38:12Z

Erlang has held on to the principle "in order to be a floating point number, it has to actually have contain a decimal point", which goes all the way back to its Prolog roots, but you can argue that floating point literals are such a common ground that they should be universally interchangeable across languages. I think it was a design mistake by the C folks to allow the dotless 1e0 version, but since it's there in C, Java, JS, etc., we'd better support it too.

Possibly we could accept the dotless form in the scanner and conversion functions, but let the compiler emit a warning if you actually use that form in code, saying something like "for readability, include the decimal point".

richcarl · 2024-11-26T10:46:15Z

Currently it lands in strtod which should make us compatible with C++, but we have this awkward pre-processing step to allow both , and . as a decimal separator

Had to test this for myself, because I'd never heard about it, but it's there! But not documented, it seems? Can the commas be dropped?

And if we allow "1e0" we should also allow ".01" (perhaps not warranting a compiler warning).

bjorng · 2024-11-27T06:58:01Z

Yes, I think we should drop support for commas. I didn't know they were allowed.

jhogberg · 2024-11-27T08:11:09Z

There’s a wrinkle to that. If we keep using strtod we need to parse the float halfway ourselves anyway so that we can change period into comma on systems whose locale says the decimal point is a comma. This is going to be ugly no matter what we do.

I think we should consider vendoring something like https://github.com/fastfloat/fast_float to get around these issues.

richcarl · 2024-11-27T13:01:50Z

There’s a wrinkle to that. If we keep using strtod we need to parse the float halfway ourselves anyway so that we can change period into comma on systems whose locale says the decimal point is a comma.

I took a quick look, and only C/C++ strtod cares about locale. Java and Javascript only accept '.'. Seems reasonable to do the same.

jhogberg · 2024-11-27T13:11:42Z

There’s a wrinkle to that. If we keep using strtod we need to parse the float halfway ourselves anyway so that we can change period into comma on systems whose locale says the decimal point is a comma.

I took a quick look, and only C/C++ strtod cares about locale. Java and Javascript only accept '.'. Seems reasonable to do the same.

Indeed, my comment was that we have to handle this nonsense if our implementation lands in strtod (converting . to , if locale only accepts ,), so we should consider using something else entirely.

brigadier · 2024-11-30T16:46:48Z

Yes, I think we should drop support for commas. I didn't know they were allowed.

Likely because in quite a few locales comma is the default decimal separator

Peter-Searby added the bug Issue is reported as a bug label Nov 14, 2024

IngelaAndin added the team:VM Assigned to OTP team VM label Nov 15, 2024

jhogberg self-assigned this Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

binary_to_float doesn't correctly handle scientific notation with no decimal places specified, such as "1e0" #9061

binary_to_float doesn't correctly handle scientific notation with no decimal places specified, such as "1e0" #9061

Peter-Searby commented Nov 14, 2024

michalmuskala commented Nov 15, 2024

josevalim commented Nov 25, 2024

jhogberg commented Nov 26, 2024

richcarl commented Nov 26, 2024

richcarl commented Nov 26, 2024

bjorng commented Nov 27, 2024

jhogberg commented Nov 27, 2024

richcarl commented Nov 27, 2024

jhogberg commented Nov 27, 2024

brigadier commented Nov 30, 2024

binary_to_float doesn't correctly handle scientific notation with no decimal places specified, such as "1e0" #9061

binary_to_float doesn't correctly handle scientific notation with no decimal places specified, such as "1e0" #9061

Comments

Peter-Searby commented Nov 14, 2024

michalmuskala commented Nov 15, 2024

josevalim commented Nov 25, 2024

jhogberg commented Nov 26, 2024

richcarl commented Nov 26, 2024

richcarl commented Nov 26, 2024

bjorng commented Nov 27, 2024

jhogberg commented Nov 27, 2024

richcarl commented Nov 27, 2024

jhogberg commented Nov 27, 2024

brigadier commented Nov 30, 2024