Skip to content

[NVFP4] Expand dynamic types, clean-up conditions #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 28, 2025

Conversation

dsikka
Copy link
Collaborator

@dsikka dsikka commented May 27, 2025

Summary:

  • Expand dynamic to include an enum local - with this change, the following conditions are now accepted:
1. If dynamic is True --> all parameters are generated on the fly
2. If dynamic is False --> all parameters are statically generated and saved to disk
3. If dynamic is local --> all local quantization parameters are generated on the fly
  • Expand nvfp4a16 to use tensor_group --> this strategy is now assocciated with the initialization of global_scales for weight and activations, not is_fp4
  • Clean-up/re-order init conditions
  • Expand testing

Testing

  • All existing tests pass + new test cases
  • LLM Compressor Test cases pass
  • NVFP4/NVFP4A16 recipes work as expected

Base automatically changed from activation_support to main May 28, 2025 02:04
Copy link
Contributor

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments

Copy link
Contributor

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we feel confident that there's no expected behavior resulting from "local" being evaluated to True, then this looks good

kylesayrs
kylesayrs previously approved these changes May 28, 2025
@dsikka
Copy link
Collaborator Author

dsikka commented May 28, 2025

As long as we feel confident that there's no expected behavior resulting from "local" being evaluated to True, then this looks good

yeah I agree. Used the explicit check for quant_config for bettter readability

@dsikka dsikka dismissed stale reviews from kylesayrs and brian-dellabetta via 6cb319b May 28, 2025 18:36
@dsikka dsikka enabled auto-merge (squash) May 28, 2025 18:36
Copy link
Contributor

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice nice nice

@dsikka dsikka merged commit 3f5705d into main May 28, 2025
1 check passed
@dsikka dsikka deleted the update_dynamic_conditions branch May 28, 2025 21:36
dsikka added a commit to vllm-project/llm-compressor that referenced this pull request May 28, 2025
SUMMARY:
- Requires neuralmagic/compressed-tensors#325
- Uses the new `tensor_group` strategy for nvfp4a16 quantization 
- Removes global_scale as an observer class parameter and passes in as a
function call, similar to g_idx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants