-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add bf16
, f64f64
and f80
types
#3456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
9302977
f34867e
26436fa
2117dda
cb88346
bdbf8fe
c57d867
8fdcc60
892334b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
- Feature Name: `add-bf16-f64f64-and-f80-type` | ||
- Start Date: 2023-7-10 | ||
- RFC PR: [rust-lang/rfcs#3456](https://github.com/rust-lang/rfcs/pull/3456) | ||
- Rust Issue: [rust-lang/rust#2629](https://github.com/rust-lang/rfcs/issues/2629) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
This RFC proposes new floating point types to enhance FFI with specific targets: | ||
|
||
- `bf16` as builtin type for the 'brain floating point' format, widely used in machine learning, different from the IEEE 754 standard `binary16` representation | ||
- `f64f64` in `core::arch` for the legacy extended float format used in the PowerPC architecture | ||
- `f80` in `core::arch` for the extended float format used in the x86 and x86_64 architectures | ||
|
||
Also, this proposal introduces `c_longdouble` in `core::ffi` to represent the correct format for 'long double' in C. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
The types listed above may be widely used in existing native code, but are not available on all targets. Their underlying representations are quite different from 16-bit and 128-bit binary floating format defined in IEEE 754. | ||
|
||
In respective targets (namely PowerPC and x86), the target-specific extended types are referenced by `long double`, which makes `long double` ambiguous in the context of FFI. Thus defining `c_longdouble` should help interoperating with C code using the `long double` type. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
`bf16` is available on all targets. The operators and constants defined for `f32` are also available for `bf16`. | ||
|
||
For `f64f64` and `f80`, their availability is limited to the following targets, but this may change over time: | ||
|
||
- `f64f64` is supported on `powerpc-*` and `powerpc64(le)-*`, available in `core::arch::{powerpc, powerpc64}` | ||
- `f80` is supported on `i[356]86-*` and `x86_64-*`, available in `core::arch::{x86, x86_64}` | ||
Comment on lines
+27
to
+32
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How useful (and feasible) would it be to also make these types
However, there are new, emerging formats that are unlikely to have that property. AArch64 has some 8-bit FP formats on the way, for example. Their incorporation into Rust would have complexities, but they're different enough from existing formats that we probably wouldn't want a polyfill for hardware that doesn't have them. Instead, I'd expect them to need a Finally: something we observed during SVE prototyping (#3268) is that sometimes, we'd really like the target feature to be associated with the type, rather than the function. That's not quite the same as gating availability that way, but it's perhaps related. |
||
|
||
The operators and constants defined for `f32` and `f64` are available for `f64f64` and `f80` in their respective arch-specific modules. | ||
|
||
All proposed types do not have literal representation. Instead, they can be converted to or from IEEE 754 compliant types. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For all of these types, what is their interaction with #3514 -- i.e., what exactly is guaranteed (or not) about their NaN values? Is there any other kind of non-determinism for any of them? |
||
|
||
## `bf16` type | ||
|
||
`bf16` consists of 1 sign bit, 8 bits of exponent, 7 bits of mantissa. Some ARM, AArch64, x86 and x86_64 targets support `bf16` operations natively. For other targets, they will be promoted into `f32` before computation and truncated back into `bf16`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that equivalent to whatever the IEEE semantics of bf16 are, if such a thing exists (i.e., a hypothetical IEEE type with 8 bits of exponent, 7 bits of mantissa)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reading other comments below it seems like this is actually an incorrect emulation. So it will be the case, for the first time in Rust history, that primitive operations such as multiplying two elements of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be possible to emulate correct rounding semantics, so this frankly seems like a bug in those softfp impls. It does require temporarily switching to RTZ mode, and then you can truncate the result with RTN-ties-even. Edit: Actually NVM, the above procedure still has an error from the correctly rounded result, of at most -2^17*ULP. You'd have to first check FE_INEXACT and just how you round accordingly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe alternatively the answer for all these float types is -- the resulting bit patterns are entirely unspecified and not guaranteed to be portable in any way. But even then we need to document how deterministic they are. Does passing the exact same inputs to an operation multiple times during a program execution always definitely produce the exact same outputs, on all targets and optimization levels? For regular floats, the answer turns out to be "only when there are no NaNs" -- that's what #3514 is all about. Sometimes, the same operation with the same inputs on the same target can produce different results depending on optimization levels and how obfuscated the surrounding code is. Even if we don't want to specify the bits that are produced by these operations, we need to specify whether results are consistent across all programs on a given target (define "target" -- is it per-triple or per-architecture), or only consistent across all operations in a single execution, or arbitrarily inconsistent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. iirc, you can do a single add/sub/mul/div/sqrt this is like how you can do that with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's true for the regular IEEE formats, yes -- it was proven here. But I don't know if bf16 is enough like an IEEE format to make that theorem apply. Also, does LLVM when it compiles bf16 to f32 guarantee to do the rounding back to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
it is, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
that I don't know, but I hope LLVM at least tries to be correct in non-fast-math mode There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That should definitely be noted as something to figure out before stabilization. |
||
|
||
`bf16` will generate the `bfloat` type in LLVM IR. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a place where bf16 ABI is defined, since it is a nonstandard float type? We need to make sure that GCC and LLVM are compatible here. |
||
|
||
## `f64f64` type | ||
|
||
`f64f64` is the legacy extended floating point format used on PowerPC targets. It consists of two `f64`s, with the former acting as a normal `f64` and the latter for an extended mantissa. | ||
|
||
The following `From` traits are implemented in `core::arch::{powerpc, powerpc64}` for conversion between `f64f64` and other floating point types: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It sounds like this type does not have semantics that are equivalent to any IEEE float type. But we need some document to explain exactly what their semantics are, i.e. the exact bits you get out when doing arithmetic on values of this type. Does such a document exist? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good question. For the most part, it would act like an f64x2 vector (that multiplication/division, etc. would cross), but when exactly bits will move between the two is a question that would need to be answered. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
wait, that's not at all how There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that's more like what I expected. Is there an explanation somewhere of what the "meaning" of such an Is the behavior of all basic operations on that kind of representation exactly guaranteed, the same way IEEE exactly guarantees behavior for our regular floats? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the unusual semantics and the somewhat legacy nature of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not knowing the details of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
sounds good to me! though I'd at least have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
ok, yeah. addition and subtraction also don't act like a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
So -- what is the mathematical value of a pair There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
yes, it is
I've heard that many special functions (like sin, cos, etc.) don't even always return canonical values (as in the result is represented differently than the exact same number would be by arithmetic ops), idk which ones. |
||
|
||
```rust | ||
impl From<bf16> for f64f64 { /* ... */ } | ||
impl From<f32> for f64f64 { /* ... */ } | ||
impl From<f64> for f64f64 { /* ... */ } | ||
``` | ||
|
||
`f64f64` will generate `ppc_fp128` type in LLVM IR. | ||
|
||
## `f80` type | ||
|
||
`f80` represents the extended precision floating point type on x86 targets, with 1 sign bit, 15 bits of exponent and 63 bits of mantissa. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the size and alignment of Do we also use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alignment would be set by the ABI. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh, according to gcc, the ABI size is 96 bits on x86 and 128 bits on x86_64: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-m96bit-long-double There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting, the alignment is also reduced to make use of that extra space ![]() (From i386 abi table 2.1 at https://www.uclibc.org/docs/psABI-i386.pdf) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this is following strict IEEE float semantics, just with different exponent/mantissa sizes than the existing types we have? That should be stated explicitly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ish. x87 has some weirdness in subnormal and nonfinite values, and it has an explicit integer bit, unlike the other interchange formats (which directly results in the aformentioned weirdness). |
||
|
||
The following `From` traits are implemented in `core::arch::{x86, x86_64}` for conversion between `f80` and other floating point types: | ||
|
||
```rust | ||
impl From<bf16> for f80 { /* ... */ } | ||
impl From<f32> for f80 { /* ... */ } | ||
impl From<f64> for f80 { /* ... */ } | ||
``` | ||
|
||
`f80` will generate the `x86_fp80` type in LLVM IR. | ||
|
||
## `c_longdouble` type in FFI | ||
|
||
`core::ffi::c_longdouble` will always represent whatever `long double` does in C. Rust will defer to the compiler backend (LLVM) for what exactly this represents, but it will approximately be: | ||
|
||
- `f80` extended precision on `x86` and `x86_64` | ||
- `f64` double precision with MSVC | ||
- `f128` quadruple precision on AArch64 | ||
- `f64f64` on PowerPC | ||
Comment on lines
+77
to
+82
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think you mean to say that we will make it match Clang, since there is no way to query LLVM as to what a ARM is another notable platform where |
||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
`bf16` is not an IEEE 754 standard type, so adding it as primitive type may break existing consistency for builtin float types. The truncation after calculations on targets not supporting `bf16` natively also breaks how Rust treats precision loss in other cases. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't get what this is saying - what consistency is broken that is specific to bf16? None of the float types specified here are fully specified in IEE 754 (though f80 is compatible with its extended precision definition). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the idea is that on targets not supporting |
||
|
||
`c_longdouble` are not uniquely determined by architecture, OS and ABI. On the same target, C compiler options may change what representation `long double` uses. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you are referring to options like |
||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
The [half](https://github.com/starkat99/half-rs) crate provides an implementation of the binary16 and bfloat16 formats. | ||
|
||
However, besides the disadvantage of usage inconsistency between primitive types and types from crates, there are still issues around those bindings. | ||
|
||
The availablity of additional float types heavily depends on CPU/OS/ABI/features of different targets. Evolution of LLVM may also unlock the possibility of the types on new targets. Implementing them in the compiler handles the stuff at the best location. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an important point. Rust's AArch64 Neon (and prototype SVE) intrinsics currently lack This proposal (and the related #3453) enable those gaps to be filled in, I think. |
||
|
||
Most of such crates define their type on top of C bindings. However the extended float type definition in C is complex and confusing. The meaning of `long double` and `_Float128` varies by targets or compiler options. Implementing them in the Rust compiler helps to maintain a stable codegen interface. | ||
|
||
And since third party tools also rely on Rust internal code, implementing additional float types in the compiler also helps the tools to recognize them. | ||
|
||
# Prior art | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
[prior-art]: #prior-art | ||
|
||
There is a previous proposal on `f16b` type to represent `bfloat16`: https://github.com/rust-lang/rfcs/pull/2690. | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
This proposal does not contain information for FFI with C's `_Float128` and `__float128` type. [RFC #3453](https://github.com/rust-lang/rfcs/pull/3453) focuses on type conforming to IEEE 754 `binary128`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can probably be dropped since |
||
|
||
Although statements like `X target supports A type` is used in above text, some target may only support some type when some target features are enabled. Such features are assumed to be enabled, with precedents like `core::arch::x86_64::__m256d` (which is part of AVX). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you list what exactly these target features are in the reference-level explanation section? This RFC should propose whether we want to just disallow the types without relevant target features (probably acceptable) or try to polyfill them somehow (I hope not, unless somebody is extremely motivated). |
||
|
||
Representation of `long double` in C may depend on some compiler options. For example, Clang on `powerpc64le-*`, `-mabi=ieeelongdouble`/`-mabi=ibmlongdouble`/`-mlong-double-64` will set `long double` as `fp128`/`ppc_fp128`/`double` in LLVM. Currently, the default option is assumed. | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
[LLVM reference for floating types]: https://llvm.org/docs/LangRef.html#floating-point-types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section needs some stronger motivation -
bf16
is not widely used (yet), and being widely used in C isn't strong enough motivation on its own for Rust to do anything. Ideas to add:f128
), like a SIMD typelong double
on every platform. Currently we only havef60
andf128
.