-
Notifications
You must be signed in to change notification settings - Fork 102
Conversation
67f2865
to
8aca043
Compare
@beetrees or @tspiteri would you be able to help me figure this out? The version of fmaf16 in this PR says #![feature(f16)]
use az::Az;
use rug::float::Round;
fn main() {
let x = f16::from_bits(0x0001);
let y = f16::from_bits(0x3bf7);
let z = f16::from_bits(0x0000);
let mut xf = rug::Float::with_val(11, x);
let yf = rug::Float::with_val(11, y);
let zf = rug::Float::with_val(11, z);
let ordering = xf.mul_add_round(&yf, &zf, Round::Nearest);
xf.subnormalize_ieee_round(ordering, Round::Nearest);
let rug_res = (&xf).az::<f16>();
println!("rug: {xf} {rug_res:?}");
println!("std: {:?}", x.mul_add(y, z));
} Prints:
Also checked against Julia to get a third opinion, which agrees with > x = fma(reinterpret(Float16, 0x0001), reinterpret(Float16, 0x3bf7), reinterpret(Float16, 0x0000))
Float16(6.0e-8)
> reinterpret(UInt16, x)
0x0001 |
It looks like what is happening there is that I created an issue: https://gitlab.com/tspiteri/rug/-/issues/78 |
Great find, thanks for looking into it. This is a problem with From a quick check it seems like this could be worked around by reducing and reexpanding the precision, which shouldn't reallocate. Any idea if there is a better way? let mut xf = rug::Float::with_val(24, f32::from_bits(1));
xf *= 0.75;
subnormalize(&mut xf);
xf *= 16.0;
subnormalize(&mut xf);
assert_eq!(xf.to_f32(), f32::from_bits(1) * 0.75 * 16.0);
fn subnormalize(xf: &mut rug::Float) {
let e = xf.get_exp().unwrap();
if e < -126 {
xf.set_prec(24_u32.saturating_sub(-e as u32 - 126));
xf.set_prec(24);
}
} |
The Yes, all the variants of |
After the change mentioned in Rug issue 78 which is now in the master branch, the code above prints:
|
Thank you for the quick fix, it seems to work great! The failure in the extensive tests looks like a real failure (interestingly, Julia returns the same result though std's implementation returns the MPFR result). |
Indeed the expected result is correct. This is a case of
I'm not entirely sure where exactly the algorithm goes wrong, but I think it is because the test for halfway cases doesn't account for the sum being subnormal, so the f32-result (which is not subnormal) has even more excess precision. |
ec64899
to
601b5a2
Compare
Recreated as rust-lang/compiler-builtins#861 |
Split from #390 since I think the
f128
version will be trickier.