[Feature Request] FP8 Support #156

LeiWang1999 · 2025-02-04T15:34:12Z

It would be great to have FP8 support for converting tensors from PyTorch to DLPack. Currently, both PyTorch and TVM support FP8, but there is no direct way to convert tensors between them. Adding this support would improve interoperability and usability.

tqchen · 2025-02-04T16:18:07Z

yes, we shoiuld do it cc @leofang

LeiWang1999 · 2025-02-04T16:50:36Z

Thanks tq, I found a temporary solution that works for me:

if arg.dtype in {torch.float8_e4m3fn, torch.float8_e4m3fnuz, torch.float8_e5m2, torch.float8_e5m2fnuz}:
      return ndarray.from_dlpack(
          to_dlpack_func(arg.view(torch.int8))
      ).view(dtype=float8_dtype_map[arg.dtype])

potatomashed · 2025-02-04T18:29:45Z

FP8 (and FP4) has become a common practice in LLM training, and in MLC-Python, we have to extend DLPack to support various FP8 types: https://github.com/mlc-ai/mlc-python/blob/0a22cf87d1888cf39dcd2f856f866c7e1d41a568/include/mlc/c_api.h#L35-L45.

Regarding DLPack support, the trickiest issue of fp8/fp4 is that there are multiple different sub-types, e.g. float8_e3m4 and float8_e4m3 are both fp8, but have different exponent/mantissa. In DLPack design, they will have to take two different dtype codes.

A common standard people may refer to is ml-dtypes package from JAX, which is a superset and consistent with PyTorch's fp8. This is something MLC-Python adopts as well. I'm happy to upstream full fp8 support from MLC-Python back to DLPack if the implementation looks okay.

tqchen · 2025-02-04T20:03:32Z

Seems starting with pytorch type is a reasonable choice, given the available hw support here, contributions are welcomed. We can add additional type code if needed if future needs arise.

leofang · 2025-02-04T20:20:59Z

cc @seberg @oleksandr-pavlyk @rgommers for vis, let's try to get this discussed in the array API meeting this week

leofang · 2025-02-04T20:23:20Z

I'm happy to upstream full fp8 support from MLC-Python back to DLPack if the implementation looks okay.

@potatomashed I am curious what you meant exactly by "full fp8 support" here, I assume you're referring to the set of additional enumerators needed for representing different fp8 subtypes? Are there things beyond this addition?

potatomashed · 2025-02-04T21:08:11Z

@potatomashed I am curious what you meant exactly by "full fp8 support" here, I assume you're referring to the set of additional enumerators needed for representing different fp8 subtypes? Are there things beyond this addition?

Yep that's just a few extra fp8 subtypes.

For reference, in PyTorch 2.6, the following fp8 subtypes are supported:

float8_e4m3fn
float8_e4m3fnuz
float8_e5m2
float8_e5m2fnuz

while ml-dtypes has some additional ones:

float8_e3m4
float8_e4m3
float8_e4m3b11fnuz
float8_e8m0fnu

tqchen · 2025-02-04T22:00:32Z

Some extra survey and thinkings:

Likely these subtypes from PT are needed.

float8_e4m3fn
float8_e4m3fnuz
float8_e5m2
float8_e5m2fnuz

The latest blackwell microscaling seems to start to support float8_e8m0fnu as an scaling factor. That combined with float4_e2m1fn would enable microscaling support for F4. In such case, micro scaling format would contain two DLPack arrays (one float4_e2m1fn for the weights and another for scale)

Would be good to also discuss potential use-cases for other data types. but this could be a good initial list(along with float4_e2m1fn)

potatomashed · 2025-02-05T00:19:00Z

MX FP training is indeed valid cases, e.g. MXFP4 or asymmetric MXFP4 https://arxiv.org/abs/2411.09909. I don't think we will have to be super over the top speculating future applications, but given the diverse set of existing subbyte dtypes, I'd love to learn DLPack maintainers' principles/rules on which dtypes to include

tqchen · 2025-02-05T00:42:22Z

as of now we focus on reasonably stablized types, mainly because the goal is to enable frameworks to exchange and also remain stable over time.

Notably, MX format usually are stored in two NDArrays, e.g. float8_e8m0fnu group scale + float4_e2m1fn value. That means as of now we can focus on the individual component types, aka float8_e8m0fnu and float4_e2m1fn

potatomashed · 2025-02-05T07:03:58Z

Notably, MX format usually are stored in two NDArrays

This assumption doesn't always hold but is a good starting point

leofang · 2025-02-05T15:12:10Z

xref: pytorch/pytorch#146414

isVoid mentioned this issue Feb 7, 2025

[FEA] Supporting DLPacks NVIDIA/numba-cuda#122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] FP8 Support #156

[Feature Request] FP8 Support #156

LeiWang1999 commented Feb 4, 2025

tqchen commented Feb 4, 2025

LeiWang1999 commented Feb 4, 2025 •

edited

Loading

potatomashed commented Feb 4, 2025 •

edited

Loading

tqchen commented Feb 4, 2025

leofang commented Feb 4, 2025

leofang commented Feb 4, 2025

potatomashed commented Feb 4, 2025

tqchen commented Feb 4, 2025 •

edited

Loading

potatomashed commented Feb 5, 2025 •

edited

Loading

tqchen commented Feb 5, 2025 •

edited

Loading

potatomashed commented Feb 5, 2025

leofang commented Feb 5, 2025

[Feature Request] FP8 Support #156

[Feature Request] FP8 Support #156

Comments

LeiWang1999 commented Feb 4, 2025

tqchen commented Feb 4, 2025

LeiWang1999 commented Feb 4, 2025 • edited Loading

potatomashed commented Feb 4, 2025 • edited Loading

tqchen commented Feb 4, 2025

leofang commented Feb 4, 2025

leofang commented Feb 4, 2025

potatomashed commented Feb 4, 2025

tqchen commented Feb 4, 2025 • edited Loading

potatomashed commented Feb 5, 2025 • edited Loading

tqchen commented Feb 5, 2025 • edited Loading

potatomashed commented Feb 5, 2025

leofang commented Feb 5, 2025

LeiWang1999 commented Feb 4, 2025 •

edited

Loading

potatomashed commented Feb 4, 2025 •

edited

Loading

tqchen commented Feb 4, 2025 •

edited

Loading

potatomashed commented Feb 5, 2025 •

edited

Loading

tqchen commented Feb 5, 2025 •

edited

Loading